Model Selection & Evaluation

Cross-validation, metrics, and model comparison.

When a model says “80% confidence” it should be right 80% of the time – reliability diagrams, Platt scaling, and isotonic regression.

Accuracy, precision, recall, F1, AUC-ROC, and AUC-PR – choosing the right metric depends on what errors cost.

K-fold, stratified, and leave-one-out validation – maximizing use of limited data for both training and evaluation.

Grid search, random search, and Bayesian optimization – finding optimal settings without overfitting to the validation set.

Plotting performance vs. training set size or training iterations – diagnosing whether you need more data, more capacity, or more regularization.

Paired t-tests, McNemar’s test, and Wilcoxon signed-rank – determining if performance differences are real or noise.

MSE, RMSE, MAE, MAPE, and R-squared – each captures different aspects of prediction quality.