Regularization

Regularization is a set of techniques used to prevent overfitting by discouraging models from becoming too complex. It works by adding a penalty to the loss function during training. Regularization helps generalize to new data, reduces model variance, and often improves interpretability.

L1 Regularization (Lasso):

L2 Regularization (Ridge):

Combined: Elastic Net

In specific model types

Neural Networks

  • Dropout: randomly disables neurons during training
  • Weight decay: adds L2 penalty on network weights
  • Batch normalization: stabilizes training and can reduce overfitting indirectly
  • Early stopping: monitors validation performance and stops training when improvement stalls.

Tree-Based Models

  • XGBoost supports L1 (alpha) and L2 (lambda) penalties, plus shrinkage (learning rate), tree pruning, and early stopping.

  • Random Forest uses structural constraints instead:

    • Bagging and feature subsampling
    • Max depth / min leaf size as built-in regularizers

Notes on Terminology

  • L1 norm: \(\|w\|_1 = \sum_i |w_i|\)

  • L2 norm: \(\|w\|_2 = \sqrt{\sum_i w_i^2}\), but usually squared in practice

  • “Lasso” stands for Least Absolute Shrinkage and Selection Operator (Tibshirani, 1996).

  • “Ridge” comes from ridge traces — plots of coefficients vs. penalty strength.