Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 92
3 Issues in Data Stream Mining
ОглавлениеOne of the challenges of data stream mining is concept drift. Concept drift is a phenomenon that bothers on how data stream evolves [19]. The presence of concept drift affects the fundamental characteristics that the learning system seeks to uncover, thus leading to degraded results by the classifier as the change progresses [20].
Concept drift in data stream can be broadly classified into two main categories, which are concept drift based on classification boundaries and concept drift concerning types of change. The former influences the classification boundaries and can be further subdivided into virtual concept drift and real concept drift. Virtual concept drift affects the conditional probability density functions, though the influence on the decision boundary is insignificant on the currently used learning models. On the other hand, real concept drift often impacts the unconditional probability density functions, leading to degraded results of the learning models. Concept drift concerning change is subdivided into sudden, gradual, and incremental concept drift. Other categories based on types of change include blip, noise, mixed, local, global, feature, and adversarial concept drifts [21]. The taxonomy of concept drift is presented in Figure 1.
Figure 1 Taxonomy of concept drift in data stream.
Three standard solutions to address concept drift are (i) to detect changes and retrain classifiers when the degree of changes is significantly high, (ii) retraining of the classification model at the arrival of a new chunk or instance, and (iii) the use of adaptive learning methods. However, option number 2 is practically not feasible due to computational cost. The four main approaches for addressing concept drift are (i) concept drift detectors [22], (ii) sliding windows [23], (iii) online learners [24], (iv) and ensemble learners [25]. Other challenges for data stream are briefly highlighted below.