Читать книгу SCADA Security - Xun Yi - Страница 18
1.4 BOOK FOCUS
ОглавлениеThis section summarizes the important lessons learned from the development of robust unsupervised SCADA data‐driven Intrusion Detection Systems (IDSs), which are detailed in the various chapters of this book. The first lesson relates to the design of a SCADA security testbed through which the practicality and efficiency of SCADA security solutions are evaluated and tested, while, the remaining three aspects focus on the details of the various elements of a robust unsupervised SCADA data‐driven IDS.
The evaluation and testing of security solutions tailored to SCADA systems is a challenging issue facing researchers and practitioners working on such systems. Several reasons for this include: privacy, security, and legal constraints that prevent organizations from publishing their respective SCADA data. In addition, it is not feasible to conduct experiments on actual live systems, as this is highly likely to affect their availability and performance. Moreover, the establishment of a real SCADA Lab can be costly and place‐constrained, and therefore unavailable to all researchers and practitioners. In this book, a framework for a SCADA security testbed is described to build a full SCADA system based on a hybrid of emulation and simulation methods. A real SCADA protocol is implemented and therefore realistic SCADA network traffic is generated. Moreover, a key benefit of this framework is that it is a realistic alternative to real‐world SCADA systems and, in particular, it can be used to evaluate the accuracy and efficiency of unsupervised SCADA data‐driven Intrusion Detection Systems (IDSs).
Unsupervised learning for anomaly‐detection methods is time‐ and cost‐efficient since they can learn from unlabeled data. This is because human expertise is not required to identify the behavior (whether normal or abnormal) for each observation in a large amount of training data sets. Anomaly scoring methods are believed to be promising automatic methods for assigning an anomaly degree to each observation (Chandola et al., 2009). The ‐NN method is one of the most interesting and best methods for computing the degree of anomaly based on neighborhood density of a particular observation (Wu et al., 2008). However, this method requires high computational cost, especially with large and high‐dimensional data that we expect to have in the development of an unsupervised SCADA data‐driven IDS. Therefore, this book describes an efficient ‐nearest neighbor‐based method, called NNVWC (‐Nearest Neighbor approach based on Various‐Widths Clustering), which utilizes a novel various‐width clustering algorithm and triangle inequality.
It is not feasible to retain all the training data in SCADA data‐driven anomaly detection methods, especially when these are built from a large training data set. This is because such detection methods will be used for on‐line monitoring, and therefore the more information retained in the detection methods, the larger the memory capacity required and the higher the computation cost required. To address this issue, this book describes a clustering‐based method to extract proximity‐based detection rules, called SDAD (SCADA Data‐Driven Anomaly Detection), which are assumed to be a tiny portion compared to the training data, for each behavior (normal and abnormal). Each rule comprehensively represents a subset of observations that represent only one behavior.
Unsupervised learning for anomaly‐detection methods are based mainly on assumptions to find the near‐optimal anomaly detection threshold. Therefore, the accuracy of the detection methods is based on the validity of the assumptions. This book, however, describes an efficient method, called GATUD (Global Anomaly Threshold to Unsupervised Detection), which firstly identifies observations whose anomaly scores significantly deviate from others to represent “abnormal” behavior. On the other hand, a tiny portion of observations whose anomaly scores are the smallest are considered to represent “normal” behavior. Then an ensemble‐based decision‐making method is described, which aims to find a global and efficient anomaly threshold using the information of both “normal”/“abnormal” behavior.