Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 20
2.1.6 Boosting GBM and XGBoost
ОглавлениеBy definition, “boosting” refers to a family of algorithms that convert weak learner to strong learners. To convert a weak learner to a strong learner, we will combine the prediction of each weak learner using methods such as average/weighted average or considering a prediction that has a higher vote. So, boosting combines weak learners (base learners) to form a strong rule. An immediate question that arises is how boosting identifies weak rules.
To find a weak rule, we apply base learning (ML) algorithms with a different distribution. Each time a base learning algorithm is applied, it generates a new weak prediction rule. This is an iterative process. After many iterations, the boosting algorithm combines these weak rules into a single strong prediction rule.
For choosing the right distribution, here are the steps: (i) The base learner takes all the distributions and assigns equal weights or attention to each observation. (ii) If any prediction error is caused by the first base learning algorithm, we pay greater attention to observations having prediction error. Then, we apply the next base learning algorithm. (iii) Iterate Step 2 until the limit of the base learning algorithm is reached or higher accuracy is achieved.
Finally, boosting combines the outputs from weak learners and creates a strong learner, which eventually improves the prediction power of the model. Boosting pays greater attention to examples that are misclassified or have higher errors generated by preceding weak learners.
There are many boosting algorithms that enhance a model’s accuracy. Next, we will present more details about the two most commonly used algorithms: Gradient Boosting (GBM) and XGBoost.
GBM versus XGBoost:
Standard GBM implementation has no regularization as in XGBoost, and therefore it also helps to reduce overfitting.
XGBoost is also known as a “regularized boosting” technique.
XGBoost implements parallel processing and is much faster than GBM.
XGBoost also supports implementation on Hadoop.
XGBoost allow users to define custom optimization objectives and evaluation criteria. This adds a whole new dimension to the model, and there is no limit to what we can do.
XGBoost has an in‐built routine to handle missing values.
The user is required to supply a value that is different from other observations and pass that as a parameter. XGBoost tries different things as it encounters a missing value on each node and learns which path to take for missing values in the future:
A GBM would stop splitting a node when it encounters a negative loss in the split. Thus, it is more of a greedy algorithm.
XGBoost, on the other hand, make splits up to the maximum depth specified and then starts pruning the tree backward, removing splits beyond which there is no positive gain. Another advantage is that sometimes a split of negative loss, say −2, may be followed by a split of positive loss, +10. GBM would stop as soon as it encounters −2. However, XGBoost will go deeper, and it will see a combined effect of +8 of the split and keep both.
XGBoost allows user to run a cross‐validation at each iteration of the boosting process, and thus it is easy to obtain the exact optimum number of boosting iterations in a single run. This is unlike GBM, where we have to run a grid search, and only limited values can be tested.
User can start training an XGBoost model from its last iteration of the previous run. This can be a significant advantage in certain specific applications. GBM implementation of sklearn also has this feature, so they are evenly matched in this respect.
GBM in R and Python: Let us first start with the overall pseudocode of the GBM algorithm for two classes:
1 Initialize the outcome.
2 Iterate from 1 to total number of trees.2.1 Update the weights for targets based on previous run (higher for the ones misclassified).2.2 Fit the model on selected subsample of data.2.3 Make predictions on the full set of observations.2.4 Update the output with current results taking into account the learning rate.
3 Return the final output.
GBM in R:
> library(caret) > fitControl <- trainControl(method = "cv", number = 10, #5folds) > tune_Grid <- expand.grid(interaction.depth = 2, n.trees = 500, shrinkage = 0.1, n.minobsinnode = 10) > set.seed(825) > fit <- train(y_train ~ ., data = train, method = "gbm", trControl = fitControl, verbose = FALSE, tuneGrid = gbmGrid) > predicted= predict(fit,test,type= "prob")[,2]
For GBM and XGBoost in Python, see [2].