Читать книгу Bioinformatics and Medical Applications - Группа авторов - Страница 22

1.3.4 Random Forest

Оглавление

The Random Forest, just as its name infers, increases the number of individual decision trees that work in conjunction. The main idea behind a random forest is the wisdom of the masses. An enormous number of moderately unrelated trees functioning as a council will surpass any existing models. Random Forest allows us to change the contributions by tuning the boundaries like basis, depth of tree, and maximum and minimum leaf. It is a supervised machine learning algorithm, used for both classification and regression. It makes use of bagging and feature randomness while assembling each singular tree to try to make an uncorrelated forest whose expectation is to be more precise than that of any individual tree. The numerical clarification of the model is as given:

1 1. Let D be a collection of dataset used for purpose of training D = (x1, y1) … (xn, yn).

2 2. Let w = w1(x); w2(x) … wk(x) be an ensemble of weak classifiers.

3 3. If every wk is a decision tree, then the parameters of the tree are described as


1 4. Output of each decision tree is a classifier wk(x) = w(x|θk).

2 5. Hence, Final Classification f(x) = Majority Voting of wk(X).

Figure 1.6 gives a pictorial representation of the working of random forest.

Some of the advantages of Random Forest algorithm are as follows:

 • Reduces overfitting problem.

 • Solves both clasification and regression problems.

 • Handles missing values automatically.

 • Stable and robust to outliers.

Figure 1.6 Random forest algorithm.

Some of the disadvantages are as follows:

 • Complexity

 • Longer training period.

Bioinformatics and Medical Applications

Подняться наверх