Bagging and Boosting

# Bagging and Boosting

1934

Bagging and Boosting are the two very important ensemble methods* to improve the measure of accuracy in predictive models which is widely used. While performing a machine learning algorithm we might come across various errors such as noise, bias, and variance and to overcome these errors we apply ensemble methods. As we know that, when applying Decision Tree for our models, we deal with only one tree to get the result. However, in case of Bagging and Boosting we deal with N defined learners and later these learners are combined to form a strong learner resulting in a more accurate result.

So, how does it happen?

The train data is randomly sampled as N learners and those N learners further results to provide the accuracy. When we discuss about Bagging and Boosting, these two techniques minutely differ during execution. In case of Bagging (Bootstrap Aggregation), the N learners as chosen gives separate results and later the average of this results is considered as the final accuracy measure but in case of Boosting each learner is given a weight according to the model results, if the result is higher , then the weight assign will also be higher. So, we can also say that Boosting technique also keeps a track of net error at each step of its performance.

Let us look at the Pros and Cons of Bagging and Boosting techniques.

Bagging:

Pros:

• Bagging method helps when we face variance or overfitting in the model. It provides an environment to deal with variance by using N learners of same size on same algorithm.

• During the sampling of train data, there are many observations which overlaps. So, the combination of these learners helps in overcoming the high variance.

• Bagging uses Bootstrap sampling method.

Cons:

• Bagging is not helpful in case of bias or underfitting in the data.

• Bagging ignores the value with the highest and the lowest result which may have a wide difference and provides an average result.

Boosting:

Pros:

• Boosting technique takes care of the weightage of the higher accuracy sample and lower accuracy sample and then gives the combined results.

• Net error is evaluated in each learning steps. It works good with interactions.

• Boosting technique helps when we are dealing with bias or underfitting in the data set.

• Multiple boosting techniques are available. For example: AdaBoost, LPBoost, XGBoost, GradientBoost, BrownBoost

Cons:

• Boosting technique often ignores overfitting or variance issues in the data set.

• It increases the complexity of the classification.

• Time and computation can be a bit expensive.

 What are the applications of ensemble methods in the real world?   There are multiple areas where Bagging and Boosting technique is used to boost the accuracy. Banking: Loan defaulter prediction, fraud transaction Credit risks Kaggle competitions Fraud detection Recommender system for Netflix Malware Wildlife conservations and so on.

Ensemble Methods*, several Decision trees are combined to provide better accuracy model rather than using single Decision tree.