Читать книгу Muography - Группа авторов - Страница 65
4.4.1 Processing of Average Fluxes with Support Vector Machine
ОглавлениеSVM is a versatile model that constructs a set of hyperplanes to separate the multi‐dimensional input data for the classification, regression, or detection of outliers (Vapnik, 1995). In this analysis, SVM was implemented with radial basis function kernel with a C regularization parameter and a γ kernel parameter. The C parameter represents the cost of training accuracy that regulates the balance between the maximization of distances of data points from the hyperplane and the maximization of correctly classified data points in training data. In case of smaller C, the distance is larger between the data points and the hyperplane, and SVM achieves lower accuracy in classification of training data. In case of larger C, the distance is smaller between the data points and hyperplane, and SVM achieves a better classification in training data; however, it is more sensitive to the unique features of training data, which can result in lower classification accuracy in test data. The γ kernel parameter represents the inverse of the radius of influence of samples selected by the SVM. In case of small γ, SVM cannot capture the complexity of the training data. In case of large γ, the support vector only includes itself that results in over‐fitting with any value of C.
Two hundred trials of parameter tuning were performed for various C − γ pairs in C ∈ [2−5 − 215] and γ ∈ [2−15 − 23] ranges with Bayesian optimization (Snoek et al., 2012). A five‐fold cross‐validation procedure was applied to avoid overfitting of data. A test set was separated from five equally divided training sets. Four out of five sets were used for training of SVM and the fifth set served as the validation set. The latter was cyclically permuted and the average score was calculated for the five cases. The SVM model achieved the best AUC score with C = 925.83 and γ = 1.75 hyperparameters (Fig. 4.5a). The result of the ROC analysis of trained SVM is shown in Fig. 4.5b. The AUC score of SVM reaches a moderated value of 0.6.