Читать книгу Data Analytics in Bioinformatics - Группа авторов - Страница 77
3.5 Critical Analysis
ОглавлениеIn this study we observed that ANN classifier outperformed all other classifiers with a reasonable accuracy result. Here we discuss our critical observation on various factors that improve the performance of ANN model in achieving high accuracy. ANN learns to solve complex problems because of their tremendous parallel processing, adaptive learning, fault tolerance and self-organization capability which ensure high classification performance. ANN has been the most powerful tool in classification and prediction. The performance of ANN algorithm depends on various factors such as pre-processing of data, (dimension of dataset, availability of incomplete data or noisy data, selection of features) activation function to be used, selection of number of epochs and neurons. Selecting huge amount of features or wrong features could affect the performance of model. The performance of ANN also depends on selection of right combination of input variables and other parameters.
In case of SVM model it is seen that when number of features exceeds number of samples the model tends to perform slow so more work on feature selection is required. But, when substantial amount of DNA sequencing data is present for two-class disease classification we can say SVM is a great classification model. We also observed that when inputs are noisy or incomplete, neural networks are still able to produce reasonable result. So the correct use of Data pre-processing technique could improve the performance of the classification model.
Another factor that influences the performance of the model is the right choice of activation function for classifying linear and non-linear data. One of the fastest learning activation functions is ReLU function that gives more accurate result because it is easy to optimize with gradient descent and result in global optimal solution.
It is observed that when number of hidden layers increases, the model gives relatively high accuracy. So we can say that performance of the model also depends on number of hidden layers and number of neurons used in each hidden layer. As we found in many articles that performance of the model hardly improves when less hidden layers are used in the network. It also increases the risk of convergence to local minima. The network fits the training data poorly if number of neurons used in the network is very less as a result the network becomes too simple and may not be able to model complex data. Similarly, if the model is trained using too many neurons, it may take excessively long time for training the input data and there is a high chance of overfitting of data. In such case the network may begin to model some random noise in the output, as a result the model generalizes the training data extremely well, but fails to generalize new or future data. Integration of ANN algorithm with different optimization algorithms minimizes the error rate produced by the classification model which in result improves the model performance.