Parallel Classification Ensemblers
These can be of two types
1. Majority / Hard Voting
This is exactly same as the averaging ensembler discussed in parallel regression ensemblers just that here instead of doing averaging of base learners we perform majority voting to get the emsembler results. Following diagram clearly explains this ….
Now we can further classify majority voting ensembler into following types based on what base learners it is using :
- Heterogeneous Ensemble — here base learners are different eg. {SVM, Logistic Regression, KNN} or {SVM, Polynomial SVM, Feed Forward Neural Network} or ….
- Homogeneous Ensemble — here base learners are same but with different hyper-parameters eg. {Decision Tree with 10 splits, Decision Tree with 5 splits, Decision Tree with 100 splits} or {KNN with K = 4, KNN with K = 9, KNN with K =1, KNN with K = 14} or {FFNN with 1 hidden layer, FFNN with 10 hidden layers, FFNN with 7 hidden layer, FFNN with 99 hidden layers, FFNN with 1000 hidden layers} or …… Now issue with combining so many FFNN models is that creating one FFNN is itself so much compute intensive then making so many models is just not feasible irl due to compute constraints and hence researchers invented DROPOUTS as discussed here !!
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard')
scores = cross_val_score(eclf, X, y, cv=5, scoring='accuracy')
2. Bagging (BootstrAp aGGregatING)
Almost very similar to majority voting just that here we do some sampling from dataset instead of training the base learners on the entire train dataset. Below figure explain it very clearly !!
Now we can further classify bagging ensembler into following types based on what base learners it is using :
- Heterogeneous Ensemble — here base learners are different
- Homogeneous Ensemble — here base learners are same but with different hyper-parameters. One famous example is “Random Forest Classification” wherein all the models are decision trees classification and hence called random forest (cause multiple trees are forest). Similarly here also you have DropOUTs for efficiently implementing ensemble of multiple FFNNs.
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
clf1 = BaggingClassifier(DecisionTreeClassifier(), n_estimators = 5) # here you can use any classifier like decision tree
scores = cross_val_score(clf1, x, y, cv=5, scoring='accuracy')