Читать книгу Muography - Группа авторов - Страница 66

4.4.2 Processing of Average Fluxes with Neural Network

An ANN model was also constructed to process average flux values. Conceptually, the ANN is a directed and weighted graph of neurons. Each neuron has multiple inputs and produces one output that can be connected to multiple neurons. The inputs on the first layer are the input data, such as series of numbers or image data. The last layer represents the output of ANN that accomplish the required task, such as prediction or classification. The connections of neurons are corresponding to the synapses of biological brain. The input of a neuron is determined by an activation function, typically by ReLU or sigmoid, that is applied on a weighted sum of the input values plus a bias term. As it is presented in the first section, the weights of different input values are set by the learning procedure to perform better the required task.

In this study, the ANN model was constructed as follows. The input layer was built up from seven neurons that were fed with the daily average flux values. The output was a single neuron corresponding to probability of volcano eruption on the eighth day. Hidden layers were applied between the input and output layers. The ReLU was applied as activation functions for the input and hidden neurons. The sigmoid function was utilized to determine the probability of eruption. Dropout was applied for the input and hidden layers to avoid overfitting (Srivastava et al., 2014). Batch normalization was also applied before ReLU function (Ioffe & Szegedy, 2015). The weights of neurons were optimized with the Adam method, which is a gradient‐based optimization algorithm that determines adaptive learning rates for each parameter via calculation of lower order moments of the gradients (Kingma & Ba, 2015).

Figure 4.5 (a) Cross‐validation scores of support vector machine are plotted as a function of C and γ parameters of radial basis function kernel. (b) Receiver operating characteristic curve for support vector machine with C = 925.83 and γ = 1.75. The circle shows the optimal cutoff point of ROC curve.

Bayesian optimization was utilized for hyperparameter tuning of ANN with 500 epochs. Early patience callback was applied to evaluate the performance of ANN on the training data set after each epoch. The training was stopped after 50 epochs to avoid overfitting of the data. The AUC of ROC curve was calculated to extract the optimal hyperparameters. The optimal number of hidden layers was found to be 3. The optimal number of neurons on the 3 hidden layers was found to be 64, 265, 256, respectively. The optimal dropout ratio was found to be 0.281. The batch size was found to be 64. The learning rate and the exponential decay rate parameters of the Adam method were found to be 4.48 × 10^–4 and 0.903, respectively. Fig. 4.6 shows the ROC curve that was extracted for the test data sets. The AUC score of fine‐tuned ANN just slightly exceeded the value of 0.5. The achieved results hint that the conventional models fed with the average muon flux values cannot accurately predict the impending eruptions of Sakurajima volcano. Although collection of further data is expected to avoid undertraining the ANN, significantly higher scores by ROC analysis are not expected because the ANN cannot extract the features of muographic images.

Figure 4.6 Receiver operating characteristic curve for neural network. The circle shows the optimal cutoff point of ROC curve.

Подняться наверх