Читать книгу Fundamentals and Methods of Machine and Deep Learning - Pradeep Singh - Страница 41
2.6 Bucket of Models
ОглавлениеThe bucket of models is one of the popular ensemble machine learning techniques used to choose the best algorithm for solving any computational intensive problems. The performance achieved by bucket of models is good compared to average of all ensemble machine learning models. One of the common strategies used to select the best model for prediction is through cross-validation. During cross-validation, all examples available in the training will be used to train the model and the best model which fits the problem will be chosen. One of the popular generalization approaches for cross-validation selection is gating. In order to implement the gating, the perceptron model will be used which assigns weight to the prediction product by each model available in the bucket. When the large number of models in the bucket is applied over a larger set of problems, the model for which the training time is more can be discarded. Landmark-based learning is a kind of bucket-based model which trains only fast algorithms present in the bucket and based on the prediction generated by fast algorithms will be used to determine the accuracy of slow algorithms in the bucket [24]. A high-level representation of bucket of models is shown in Figure 2.5. The data store maintains the repository of information, which is fed as input to each of the base learners. Each of the base learners generates their own prediction as output which is fed as input to the metalearner. Finally, the metalearner does summation of each of the predictions to generate final prediction as output.
Figure 2.5 A high-level representation of bucket of models.
One of the best suitable approaches for cross-validation among multiple models in ensemble learning is bake off contest, the pseudo-code of which is given below.
Pseudo-code: Bucket of models
For each of the ensemble model present in the bucket doRepeat constant number of timesDivide the training set into parts, i.e., training set and test set randomlyTrain the ensemble model with training setTest the ensemble model with test setChoose the ensemble model that yields maximum average score value
Some of the advantages offered by bucket of models in diagnosing the zonotic diseases are as follows: high quality prediction, provides unified view of the data, negotiation of local patterns, less sensitive to outliers, stability of the model is high, slower model gets benefited from faster models, parallelized automation of tasks, learning rate is good on large data samples, payload functionality will be hidden from end users, robustness of the model is high, error generation rate is less, able to handle the random fluctuations in the input data samples, length of the bucket is kept medium, easier extraction of features from large data samples, prediction happens by extracting the data from deep web, linear weighted average model is used, tendency of forming suboptimal solutions is blocked, and so on [25, 26].