Penalty Regulariser — Ridge & Lasso Regression
This in in continuation of this medium article, please read this first to understand the context !!
Table of Contents :
- L2 / Ridge Regularization
a. Cost Function Derivation
b. Optimization Using Normal Equations
c. Optimization Using Gradient Descent
d. Optimization Using Stochastic Gradient Descent
e. Optimization Using Mini-Batch Gradient Descent - L1 / Lasso Regularization
a. Optimization Using Normal Equations
b. Optimization Using Gradient Descent
c. Optimization Using Stochastic Gradient Descent
d. Optimization Using Mini-Batch Gradient Descent - Elastic Net Regularization
- Code Implementation
L2 or Ridge Regulariser
Researchers observed that when overfitting occurs then the weights become very high compared to when overfitting does not occurs. Hence to handle the overfitting problem they decided to stop weights from growing too much and this can be done by imposing some restrictions on the weights W0, W1, ….Wn.
In case of ridge regularization following was the constraint put on the weights :
Now what we saw above was a very crude and intuitive way of deriving cost function for ridge regularization, let’s see how to derive the same cost function using bayesian mathematic. (Refer this post where I have introduced bayesian mathematics before reading this)
Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with
- Calculus methods like Normal Equation, and
- Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)
We will now derive mathematical equations for these methods as seen in earlier posts !! (Kindly make sure you understand these derivations properly cause in code implementation we will directly be using these formulas !!)
L1 or Lasso or Least Angle Regulariser
Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with
- Calculus methods like Normal Equation, and
- Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)
We will now derive mathematical equations for these methods as seen in earlier posts !! (Kindly make sure you understand these derivations properly cause in code implementation we will directly be using these formulas !!)
Elastic Net Regulariser
Hence now to solve this convex unconstrained non-linear optimization problem we are already familiar with
- Calculus methods like Normal Equation, and
- Indirect search methods like GD / SGD / MBGD / ….. (similarly you can have stochastic and minibatch version of other indirect search methods like Adam / RMSprop /….. discussed here)
I urge you to derive the mathematical equations for these methods on your own, it is on the similar lines as discussed above !!