Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 58
4.1.3 Model Extraction
ОглавлениеModel‐specific interpretability methods are limited to specific model classes. Here, when we require a particular type of interpretation, we are limited in terms of choice to models that provide it, potentially at the expense of using a more predictive and representative model. For that reason, there has been a recent surge of interest in model‐agnostic interpretability methods as they are model‐free and not tied to a particular type of ML model. This class of methods separates prediction from explanation. Model‐agnostic interpretations are usually post‐hoc; they are generally used to interpret ANN and could be local or global interpretable models. In the interest of improving interpretability AI models, a large number of model‐agnostic methods have been developed recently using a range of techniques from statistics, ML, and data science. Here, we group them into four technique types: visualization, knowledge extraction, influence methods, and example‐based explanation ().
1 Visualization: A natural way to understand an ML model, especially DNN, is to visualize its representations to explore the pattern hidden inside a neural unit. The largest body of research employs this approach with the help of different visualization techniques in order to see inside these black boxes. Visualization techniques are essentially applied to supervised learning models. The popular visualization techniques are surrogate models, partial dependence plot (PDP), and individual conditional expectation (ICE).A surrogate model is an interpretable model (like a linear model or decision tree) that is trained on the predictions of the original black‐box model in order to interpret the latter. However, there are almost no theoretical guarantees that the simple surrogate model is highly representative of the more complex model. The aforementioned [37] approach is a prescribed method for building local surrogate models around single observations. In [59], a surrogate model approach was used to extract a decision tree that represents model behavior. In [60], an approach to building TreeView visualizations using a surrogate model was proposed.PDP is a graphical representation that helps in visualizing the average partial relationship between one or more input variables and the predictions of a black‐box model. In [61], PDP is used to understand the relationship between predictors and the conditional average treatment effect for a voter mobilization experiment, with the predictions being made by Bayesian Additive Regression Trees [62]. The approach in [63], which relies on stochastic gradient boosting, used PDPs to understand how different environmental factors influence the distribution of a particular freshwater. Work in [12] demonstrates the advantage of using random forests and the associated PDPs to accurately model predictor‐response relationships under asymmetric classification costs that often arise in criminal justice settings. Work in [64] proposed a methodology called Forest Floor to visualize and interpret random forest models, the proposed techniques rely on the feature contributions method rather than on PDP. It was argued by the authors that the advantages of Forest Floor over PDP is that interactions are not masked by averaging, making it is possible to locate interactions that are not visualized in a given projection.ICE plots extended PDP. Whereas PDPs provide a coarse view of a model’s workings, ICE plots reveal interactions and individual differences by disaggregating the PDP output. Recent works use ICE rather than the classical PDP. For instance, [65] introduced ICE techniques and proved its advantage over PDP. Later, [66] proposed a local feature importance‐based approach that uses both partial importance (PI) and individual conditional importance (ICI) plots as visual tools.
2 Knowledge extraction: It is difficult to explain how ML models work, especially when the models are based on ANN. The task of extracting explanations from the network is therefore to extract the knowledge acquired by an ANN during training and encoded as an internal representation. Several works propose methods to extract the knowledge embedded in the ANN that mainly rely on two techniques: rule extraction and model distillation.Rule extraction is an effort to gain insight into highly complex models [67–69] by extracting rules that approximate the decision‐making process in ANN by utilizing the input and output of the ANN. The survey in [27] based on [70, 71] proposed three modes to extract rules: pedagogical, decompositional, and eclectic rule extraction. Decompositional approaches focus on extracting rules at the level of individual units within the trained ANN; that is, the view of the underlying ANN is one of transparency [46–48]. On the other hand, pedagogical approaches treat the trained ANN as a black‐box; that is, the view of the underlying ANN is opaque. The Orthogonal Search‐based Rule Extraction algorithm (OSRE) from [72] is a successful pedagogical methodology often applied in biomedicine. The third type (eclectic) is a hybrid approach for rule extraction that incorporates elements of the previous rule‐extraction techniques [73].Model distillation is another technique that falls in the knowledge extraction category. Distillation is a model compression to transfer information (dark knowledge) from deep networks (the “teacher”) to shallow networks (the “student”) [74, 75]. Model compression was originally proposed to reduce the computational cost of a model runtime, but has later been applied for interpretability.Tan et al. [10] investigated how model distillation can be used to distill complex models into transparent models. Che et al. [76] introduced a knowledge‐distillation approach called Interpretable Mimic Learning to learn interpretable phenotype features for making robust prediction while mimicking the performance of deep learning models. A recent work by Xu et al. [77] presented DarkSight, a visualization method for interpreting the predictions of a black‐box classifier on a dataset in a way that is inspired by the notion of dark knowledge. The proposed method combines ideas from knowledge distillation, dimension reduction, and visualization of DNN [78–80].
3 Influence techniques estimate the importance or the relevance of a feature by changing the input or internal components and recording how much the changes affect model performance. Influence techniques are often visualized. There are three alternative methods to obtain an input variable’s relevance: sensitivity analysis (SA), layer‐wise relevance propagation (LRP), and feature importance.SA searches for the answer to the question of how an ANN output is influenced by its input and/or weight perturbations [81]. It is used to verify whether model behavior and outputs remain stable when data are intentionally perturbed or other changes are simulated in data. Visualizing the results of SA is considered an agnostic explanation technique, since displaying a model’s stability as data change over time enhances trust in ML results. SA has been increasingly used in explaining ANN in general and DNN classification of images in particular [82, 83]. However, it is important to note that SA does not produce an explanation of the function value itself, but rather a variation of it. The purpose of performing an SA is thus usually not to actually explain the relationship but to test models for stability and trustworthiness, either as a tool to find and remove unimportant input attributes or as a starting point for some more powerful explanation technique (e.g., decomposition).LRP was proposed in [84] as the LRP algorithm. It redistributes the prediction function backward, starting from the output layer of the network and backpropagating up to the input layer. The key property of this redistribution process is referred to as relevance conservation. In contrast to SA, this method explains predictions relative to the state of maximum uncertainty; that is, it identifies properties that are pivotal for the prediction. Feature importance quantifies the contribution of each input variable (feature) to the predictions of a complex ML model. The increase in the model’s prediction error is calculated after permuting the feature in order to measure its importance. Permuting the values of important features increases the model error, whereas permuting the values of unimportant features does not affect the model and thus keeps the model error unchanged. Using this technique, Fisher et al. [85] proposed a model‐agnostic version of the algorithm called Model Class Reliance (MCR), while [66] proposed a local version of the algorithm called SFIMP for permutation‐based Shapley feature importance. LOCO [39] uses local feature importance as well.
4 EBE techniques select particular instances of the dataset to explain the behavior of ML models. The approach is mostly model agnostic since they make any ML model more interpretable. The slight difference from model‐agnostic methods is that the EBE methods interpret a model by selecting instances of the dataset and not by acting on features or transforming the model.
Two versions of this technique are (i) Prototypes and Criticisms and (ii) Counterfactual Explanations (CE).
Prototypes and criticisms: Prototypes are a selection of representative instances from the data [86–88]; thus, item membership is determined by its similarity to the prototypes, which leads to overgeneralization. To avoid this, exceptions have to be identified also, called criticisms: instances that are not well represented by those prototypes. Kim [89] developed an unsupervised algorithm for automatically finding prototypes and criticisms for a dataset, called Maximum Mean Discrepancy‐critic (MMD‐critic). When applied to unlabeled data, it finds prototypes and critics that characterize the dataset as a whole.
Counterfactual explanations (CE): Wachter et al. [90] presented the concept of “unconditional counterfactual explanations” as a novel type of explanation of automated decisions. CE describe the minimum conditions that would have led to an alternative decision (e.g., a bank loan being or not being approved), without the need to describe the full logic of the algorithm. The focus here is on explaining a single prediction in contrast to adversarial examples, where the emphasis is on reversing the prediction and not explaining it [91]. As the research contributions in this class of methods are actively growing, new model‐agnostic techniques are regularly proposed. In the next section, we will present in more detail some of the techniques listed in this section.