Читать книгу Fundamentals and Methods of Machine and Deep Learning - Pradeep Singh - Страница 38
2.3 Bootstrap Aggregating (Bagging)
ОглавлениеBootstrap aggregating is popularly referred as bagging is a machine learning–based ensemble technique which improves the accuracy of the algorithm and is used mostly for classification or aggregation purposes. The main purpose of bagging is that it avoids overfitting problem by properly generalizing the existing data samples. Consider any standard input dataset from which new training datasets are generated by sampling the data samples uniformly with replacement. By considering the replacements, some of the observations are repeated in the form of the unique data samples using regression or voting mechanisms. The bagging technique is composed of artificial neural networks and regression tree, which are used to improve the unstable procedures. For any given application, the selection between bagging and boosting depends on the availability of the data. The variance incurred is reduced by combining bootstrap and bagging [15, 16].
Bagging and boosting operations are considered as two most powerful tools in ensemble machine learning. The bagging operation is used concurrently with the decision tree which increases the stability of the model by reducing the variance and also improves the accuracy of the model by minimizing the error rate. The aggregation of set of predictions made by the ensemble models happens to produce best prediction as the output. While doing bootstrapping the prominent sample is taken out using the replacement mechanism in which the selection of new variables is dependent on the previous random selections. The practical application of this technique is dependent on the base learning algorithm which is chosen first and on top of which the bagging of pool of decision trees happen. Some of the zonotic diseases which can be identified and treated well using bootstrap aggregating are zonotic influenza, salmonellosis, West Nile virus, plague, rabies, Lyme disease, brucellosis, and so on [17]. A high-level representation of bootstrap is shown in Figure 2.2. It begins with the training dataset, which is distributed among the multiple bootstrap sampling units. Each of the bootstrap sampling unit operates on the training subset of data upon which the learning algorithm performs the learning operation and generates the classification output. The aggregated sum of each of the classifier is generated as the output.
Figure 2.2 A high-level representation of Bootstrap aggregating.
Some of the advantages offered by bootstrap in diagnosing the zonotic diseases are as follows: the aggregated power of several weak learners over runs the performance of strong learner, the variance incurred gets reduced as it efficiently handles overfitting problem, no loss of precision during interoperability, computationally not expensive due to proper management of resources, computation of over confidence bias becomes easier, equal weights are assigned to models which increases the performance, misclassification of samples is less, very much robust to the effect of the outliers and noise, the models can be easily paralyzed, achieves high accuracy through incremental development of the model, stabilizes the unstable methods, easier from implementation point of view, provision for unbiased estimate of test errors, easily overcomes the pitfalls of individuals machine learning models, and so on.