Читать книгу Machine Learning Techniques and Analytics for Cloud Security - Группа авторов - Страница 63

3.4.2 Result Analysis

Оглавление

While executing the algorithm taking r = 5, i.e., a group of five genes is selected at random at a time. So, for lung dataset, it consisting of 5 cols (genes/features) and 96 rows (samples), which is divided into test and training dataset. For colon, it is 5 and 36, divided in same manner. Here, test data consist of 20% of the dataset and rest 80% belongs to training dataset. This dataset is scaled down by applying standard scalar and features of dataset is brought down onto unit scale. Then, PCA is applied on the selected 5 × 96 matrix. While applying PCA, the variance α is taken as 0.95 as number of components, parameter on both lung and colon datasets.

After reducing the dimensionality of the dataset, LR is applied using “sag” method for faster convergence. Predictive value is calculated based on the training dataset and then accuracy is calculated by comparing this predicted value and test data. When the accuracy was found to be more than 85%, those genes were selected as cancer mediating gene and stored in a new list.

For lung dataset, 886 genes were selected. When these genes were matched with the genes in the NCBI database, 102 were found to be true positive (TP). For colon dataset, 207 genes were selected out of which 85 were found to be TP when matched with NCBI database.

Machine Learning Techniques and Analytics for Cloud Security

Подняться наверх