Читать книгу Rank-Based Methods for Shrinkage and Selection - A.K. Saleh Md.Ehsanes, A. K. Md. Ehsanes Saleh - Страница 10
List of Figures
Оглавление1.1 Four plots using different versions of the telephone data set with fitted lines.
1.2 Histograms and ordered residual plots of LS and Theil estimators.
1.3 Effect of a single outlier on LS and rank estimators.
1.4 Gradients of absolute value (Bn′(θ)) and dispersion (Dn′(θ)) functions.
1.5 Scoring functions ϕ(u)=12(u−0.5) and ϕ+(u)=3u.
1.6 Dispersion functions and derivative plots for 1.1(d).
1.7 Key shrinkage characteristics of LASSO and ridge.
1.8 Geometric interpretation of ridge.
1.9 Geometric interpretation of LASSO.
2.1 The first-order nature of shrinkage due to ridge.
2.2 Two outliers found in the Q–Q plot for the Swiss data set.
2.3 Sampling distributions of rank estimates.
2.4 Shrinkage of β5 due to increase in ridge tuning parameter, λ2.
2.5 Ridge traces for orthonormal, diagonal, LS, and rank estimators (m = 40).
2.6 MSE Derivative plot to find optimal λ2 for the diagonal case.
2.7 Bias, variance and MSE for the Swiss data set (optimal λ2 = 70.8).
2.8 MSE for training, CV and test sets, and coefficients from the ridge trace.
2.9 The first-order nature of shrinkage due to LASSO.
2.10 Diamond-warping effect of weights in the aLASSO estimator for p = 2.
2.11 Comparison of LASSO and aLASSO traces for the Swiss data set.
2.12 Variable ordering from R-LASSO and R-aLASSO traces for the Swiss data set.
2.13 Ranked residuals of the diabetes data set. (Source: Rfit() package in R.)
2.14 Rank-aLASSO trace of the diabetes data set showing variable importance.
2.15 Diabetes data set showing variable ordering and adjusted R2 plot.
2.16 Rank-aLASSO cleaning followed by rank-ridge estimation.
2.17 R-ridge traces and CV scheme with optimal λ2.
2.18 MSE and MAE plots for five-fold CV scheme producing similar optimal λ2.
2.19 LS-Enet traces for α = 0.0, 0.2, 0.4, 0.8, 1.0.
2.20 LS-Enet traces and five-fold CV results for α = 0.6 from glmnet().
3.1 Key shrinkage R-estimators to be considered.
3.2 The ADRE of the shrinkage R-estimator using the optimal c and URE.
3.3 The ADRE of the preliminary test (or hard threshold) R-estimator for different Δ2 based on λ*=2ln(2).
3.4 The ADRE of nEnet R-estimators.
3.5 Figure of the ADRE of all R-estimators for different Δ2.
4.1 Boxplot and Q–Q plot using ANOVA table data.
4.2 LS-ridge and ridge R traces for fertilizer problem from ANOVA table data.
4.3 LS-LASSO and LASSOR traces for the fertilizer problem from the ANOVA table data.
4.4 Effect of variance on shrinkage using ridge and LASSO traces.
4.5 Hard threshold and positive-rule Stein–Saleh traces for ANOVA table data.
8.1 Left: the qq-plot for the diabates data sets; Right: the distribution of the residuals.
11.1 Sigmoid function.
11.2 Outlier in the context of logistic regression.
11.3 LLR vs. RLR with one outlier.
11.4 LLR vs. RLR with no outliers.
11.5 LLR vs. RLR with two outliers.
11.6 Binary classification – nonlinear decision boundary.
11.7 Binary classification comparison – nonlinear boundary.
11.8 Ridge comparison of number of correct solutions with n = 337.
11.9 LLR-ridge regularization showing the shrinking decision boundary.
11.10 LLR, RLR and SVM on the circular data set with mixed outliers.
11.11 Histogram of passengers: (a) age and (b) fare.
11.12 Histogram of residuals associated with the null, LLR, RLR, and SVM cases for the Titanic data set. SVM probabilities were extracted from the sklearn.svm package.
11.13 RLR-ridge trace for Titanic data set.
11.14 RLR-LASSO trace for the Titanic data set.
11.15 RLR-aLASSO trace for the Titanic data set.
12.1 Computational unit (neuron) for neural networks.
12.2 Sigmoid and relu activation functions.
12.3 Four-layer neural network.
12.4 Neural network example of back propagation.
12.5 Forward propagation matrix and vector operations.
12.6 ROC curve and random guess classifier line based on the RLR classifier on the Titanic data set of Chapter 11.
12.7 Neural network architecture for the circular data set.
12.8 LNNs and RNNs on the circular data set (n = 337) with nonlinear decision boundaries.
12.9 Convergence plots for LNNs and RNNs for the circular data set.
12.10 ROC plots for LNNs and RNNs for the circular data set.
12.11 Typical setup for supervised learning methods. The training set is used to build the model.
12.12 Examples from test data set with cat = 1, dog = 0.
12.13 Unrolling of an RGB image into a single vector.
12.14 Effect of over-fitting, under-fitting and regularization.
12.15 Convergence plots for LLN and RNNs (test size = 35).
12.16 ROC plots for LLN and RNNs (test size = 35).
12.17 Ten representative images from the MNIST data set.
12.18 LNN and RNN convergence traces – loss vs. iterations (Χ100).
12.19 Residue histograms for LNNs (0 outliers) and RNNs (50 outliers).
12.20 These are 49 potential outlier images reported by RNNs.
12.21 LNN (0 outliers) and RNN (144 outliers) residue histograms.
12.22 LNN and RNN confusion matrices and MCC scores. 418