Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 19
2.1.5 Bagging and Random Forest
ОглавлениеBagging is a technique used to reduce the variance of our predictions by combining the result of multiple classifiers modeled on different subsamples of the same dataset. The steps followed in bagging are as follows:
Form multiple datasets: Sampling is done with replacement on the original data, and new datasets are formed. These new datasets can have a fraction of the columns as well as rows, which are generally hyperparameters in a bagging model. Taking row and column fractions less than one helps in making robust models that are less prone to overfitting.
Develop multiple classifiers: Classifiers are built on each dataset. In general, the same classifier is modeled on each dataset, and predictions are made.
Integrate classifiers: The predictions of all the classifiers are combined using a mean, median, or mode value depending on the problem at hand. The combined values are generally more robust than those from a single model. It can be theoretically shown that the variance of the combined predictions is reduced to 1/n (n: number of classifiers) of the original variance, under some assumptions.
There are various implementations of bagging models. Random forest is one of them, and we will discuss it next.
In random forest, we grow multiple trees as opposed to a single tree. To classify a new object based on attributes, each tree gives a classification, and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest), and in case of regression, it takes the average of outputs from different trees.
In R packages, random forests have simple implementations. Here is an example;
> library(randomForest) > x <‐ cbind(x_train,y_train) # Fitting model > fit <‐ randomForest(Species ~ ., x,ntree=500) > summary(fit) #Predict Output > predicted= predict(fit,x_test)