Читать книгу Data Analytics in Bioinformatics - Группа авторов - Страница 25
1.6 K-Nearest Neighbor
ОглавлениеK-Nearest Neighbor belongs to the category of supervised classification algorithm and hence, needs labeled data for training [77, 78]. In this approach, the value of K is suggested by the user. It can be used for both the classification and regression approaches but the attributes must be known. By performing the KNN algorithm, it will give new data points according to the k-number or the closest data points.
In the heart disease dataset also, The Area under the ROC Curve (AUC) has been used. It is the most basic tool for judging the classifier’s performance in a medical decision making concerns [79–81]. It is a graphical plot for judging the diagnostic ability with the help of a binary classifier. The generated ROC curve for KNN on the heart disease dataset [41] is presented below in Figure 1.14.
In the above figure, the true positive rate (probability of detection) is mentioned on the Y-axis, and on the x-axis, the false positive rate (probability of false alarm) is mentioned. The False Positive rate depicts the unit proportion with a known negative condition for which the predicted condition is positive.
The Area under the ROC Curve (AUC) of K-nearest neighbor is performed on the heart disease dataset [41] in python (Google Colab) and shown below in Table 1.5.
Figure 1.14 ROC curve for k-nearest neighbor.
Table 1.5 AUC: K-nearest neighbor.
Parameter | Data | Value | Result |
The area under the ROC Curve (AUC) | Training Data | 1.0000000 | Outstanding |
Test Data | 1.0000000 | Outstanding | |
Index: 0.5: No Discriminant, 0.6–0.8: Can be considered accepted, 0.8–0.9: Excellent, >0.9: Outstanding |
The obtained value of Training Data is 1.0000000 that attains an outstanding remark and the value of the testing data is 1.0000000 that attains an outstanding remark in the AUC score. The result shows that KNN performs outstandingly on the dataset.