Penalty Regulariser — Ridge & Lasso Regression

Sarvesh Khetan
5 min readJun 10, 2024

--

This in in continuation of this medium article, please read this first to understand the context !!

Table of Contents :

  1. L2 / Ridge Regularization
    a. Cost Function Derivation
    b. Optimization Using Normal Equations
    c. Optimization Using Gradient Descent
    d. Optimization Using Stochastic Gradient Descent
    e. Optimization Using Mini-Batch Gradient Descent
  2. L1 / Lasso Regularization
    a. Optimization Using Normal Equations
    b. Optimization Using Gradient Descent
    c. Optimization Using Stochastic Gradient Descent
    d. Optimization Using Mini-Batch Gradient Descent
  3. Elastic Net Regularization
  4. Code Implementation

L2 or Ridge Regulariser

Researchers observed that when overfitting occurs then the weights become very high compared to when overfitting does not occurs. Hence to handle the overfitting problem they decided to stop weights from growing too much and this can be done by imposing some restrictions on the weights W0, W1, ….Wn.

In case of ridge regularization following was the constraint put on the weights :

Constrained Optimization Formulation
Constrained Optimization => Unconstrained Optimization Formulation

Now what we saw above was a very crude and intuitive way of deriving cost function for ridge regularization, let’s see how to derive the same cost function using bayesian mathematic. (Refer this post where I have introduced bayesian mathematics before reading this)

Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with

  1. Calculus methods like Normal Equation, and
  2. Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)

We will now derive mathematical equations for these methods as seen in earlier posts !! (Kindly make sure you understand these derivations properly cause in code implementation we will directly be using these formulas !!)

L1 or Lasso or Least Angle Regulariser

Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with

  1. Calculus methods like Normal Equation, and
  2. Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)

We will now derive mathematical equations for these methods as seen in earlier posts !! (Kindly make sure you understand these derivations properly cause in code implementation we will directly be using these formulas !!)

Elastic Net Regulariser

Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with

  1. Calculus methods like Normal Equation, and
  2. Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)

I urge you to derive the mathematical equations for these methods on your own, it is on the similar lines as discussed above !!

--

--

Sarvesh Khetan

A deep learning enthusiast and a Masters Student at University of Maryland, College Park.