Penalty Regulariser — Ridge & Lasso Regression

5 min readJun 10, 2024

--

This in in continuation of this medium article, please read this first to understand the context !!

L2 / Ridge Regularization
a. Cost Function Derivation
b. Optimization Using Normal Equations
c. Optimization Using Gradient Descent
d. Optimization Using Stochastic Gradient Descent
e. Optimization Using Mini-Batch Gradient Descent
L1 / Lasso Regularization
a. Optimization Using Normal Equations
b. Optimization Using Gradient Descent
c. Optimization Using Stochastic Gradient Descent
d. Optimization Using Mini-Batch Gradient Descent
Elastic Net Regularization
Code Implementation

L2 or Ridge Regulariser

Researchers observed that when overfitting occurs then the weights become very high compared to when overfitting does not occurs. Hence to handle the overfitting problem they decided to stop weights from growing too much and this can be done by imposing some restrictions on the weights W0, W1, ….Wn.

In case of ridge regularization following was the constraint put on the weights :

Constrained Optimization => Unconstrained Optimization Formulation

Now what we saw above was a very crude and intuitive way of deriving cost function for ridge regularization, let’s see how to derive the same cost function using bayesian mathematic. (Refer this post where I have introduced bayesian mathematics before reading this)

Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with

Calculus methods like Normal Equation, and
Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)

We will now derive mathematical equations for these methods as seen in earlier posts !! (Kindly make sure you understand these derivations properly cause in code implementation we will directly be using these formulas !!)

L1 or Lasso or Least Angle Regulariser

Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with

Calculus methods like Normal Equation, and
Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)

We will now derive mathematical equations for these methods as seen in earlier posts !! (Kindly make sure you understand these derivations properly cause in code implementation we will directly be using these formulas !!)

Elastic Net Regulariser

Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with

Calculus methods like Normal Equation, and
Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)

I urge you to derive the mathematical equations for these methods on your own, it is on the similar lines as discussed above !!

Code Implementation

Tabular-Cross-Sectional-Modelling/modelling/regression/multiple_polynomial_regression at main ·…

Implementation of algorithms such as normal equations, gradient descent, stochastic gradient descent, lasso…

github.com

Penalty Regulariser — Ridge & Lasso Regression

Table of Contents :

L2 or Ridge Regulariser

L1 or Lasso or Least Angle Regulariser

Elastic Net Regulariser

Code Implementation

Tabular-Cross-Sectional-Modelling/modelling/regression/multiple_polynomial_regression at main ·…

Implementation of algorithms such as normal equations, gradient descent, stochastic gradient descent, lasso…

Written by Sarvesh Khetan