Читать книгу Bioinformatics and Medical Applications - Группа авторов - Страница 32
1.5 Conclusion
ОглавлениеAn ensemble of classifiers is a collection of classification models whose singular forecasts are joined, by means of weighted or unweighted casting a ballot to dole out a classification mark to each new pattern. There is no single best method of creating successful ensemble methods and is being actively researched. Predicting heart disease has been a topic of interest for researchers for a long time. We therefore check the accuracy of the heart disease prediction using an ensemble of classifiers. For our study, we chose the best performing algorithms whose individual predictions made them classify as strong classifiers. We used a combination of Decision Tree, Naive Bayes, Random Forest, and K means algorithm. Since no single algorithm can guarantee maximum performance under all circumstances, we use the majority voting method to best classify the records. The dataset used for this purpose was Kaggle dataset for cardiovascular disease which has 70,000 records on which we achieved an accuracy of 91.56%.
However, we realized the potential of further increasing the accuracy by analyzing those records which were wrongly classified by all/most of the algorithms. The reason for it could be high bias, high variance, low precision, or low recall. So, we identified those columns/attributes which were causing the data to be misclassified by assigning probabilities to each tuple in the column and combining those probabilities by using conditional probability. Hence, we focused only on those columns which would result in accurate prediction by increasing the weight of those columns and feature reduction. Hence, by using the probabilistic approach, we could effectively remove the anomalies and increase the prediction accuracy.
