Decision Tree — Classification Explained !!

Tree Based Learning

5 min readJul 19, 2024

Intuition

We will work with following dataset, and try to understand decision tree algorithm using this dataset!!

In decision tree algorithm, our aim is to first form a tree and then using this tree you can take decisions. Hence called decision tree!! Now let’s see how to form this tree.

Now out of all these possible root nodes, we need to select one. How ?? Let’s see how to select the best node. Below I will show and compare two such possible cases, you can do rest for yourself.

Case1 is a better root node compared to Case2 because using Case1 we could split the dataset into two such haves such that the +ve class and -ve class both got separated perfect (hence called pure node) while in Case2 they didn’t.

In Case2 the left part has only +ve classes while in the right half both +ve and -ve classes are found. Hence not a perfect class separation. Hence called impure node.

Now this exact analysis of whether a split breaks the dataset into seperate classes or not can be formulated as a mathematical formula called SHANON’s ENTROPY / ENTROPY!!

Like this you compare entropies for all possible root node cases and select the one which has highest entropy.

Instead of entropy you can use other alternative metrics like Gini Impurity (also called gini index or gini) / Chisquare / Gain Ratio / ….

KeyNotes

Huge Dataset Effect: Doesn’t matters
Imbalance dataset and outlier Effect: Doesn’t matters
Feature Scaling Effect: Doesn’t matters
Dimensionality Issue: Doesn’t Matter
Missing Value Impact: Doesn’t Matter
Effect of Continuous Features : If you understood the above intuition well, you will acknowledge the fact that continuous columns requires a lot of computation and hence the time complexity of the algorithm increases. To tackle this, we can perform discritization of continuous variables to discrete variables before applying decision tree algorithm to it!!
Overfitting : If you go till the very end of the tree it will have overfitting issue. Hence to solve this we have following techniques :

Data augmentation techniques :
Early Stopping : Overfitting issue was due to going to end of the tree and hence we don’t go till the very depth of the tree. This is also called pruning. There are two types of ‘pruning’ strategy here : pre pruning and post pruning

Pre-pruning: Stop growing a branch when information becomes unreliable
Post-pruning: Take a fully-grown decision tree and discard unreliable parts (preferred because pre-pruning can “stop too early”)
Limit the depth of the decision tree by specifying a threshold value

Decision Boundary

Hence in decision trees, you have horizontal and vertical lines as decision boundaries, it cannot be slat lines. A better way to put this is that decision boundaries are parallel to axes. Hence this is piece wise linear (non — linear) decision boundary!!

Hyperparameter Tuning

Whether you want to use GINI Index or Entropy to build the tree is a hyperparameter. Most of the times people prefer using GINI over Entropy because GINI is faster since it does not have any log computation!!

Model Explainability

Decision Trees are the most interpretable classifiers that the mankind has ever know of!! One of the reason why decision tree is so famous is due to the fact that it is highly interpretable. In decision tree we get a final tree like structure which helps us to decide whether a test example belong to +ve class or -ve class, since this tree structure is easily understandable, it is easy to interpret how decision tree arrived at the answer.

Decision Tree Follows Local Constancy Assumption?

Yes, decision trees also follows local constancy assumption, I am not proving this, try to prove this on your own just the way I did it for KNN and SVM. Hence in datasets where the local constancy assumptions doesn’t holds good, in those cases Decision Trees will fail to give good results.

Code Implementation

Below you can find both library based implementation and from scratch implementation of decision tree algorithm in python

Tabular-Cross-Sectional-Modelling/modelling/classification/decision_tree.ipynb at main ·…

Implementation of algorithms such as normal equations, gradient descent, stochastic gradient descent, lasso…

github.com