Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 105
6 Streaming Data Algorithms
ОглавлениеData stream poses a significant number of challenges to mining algorithms and research community due to the high‐traffic, high‐velocity, and brief life span of streaming data [79]. Many algorithms that are suitable for mining data at rest are not suited to streaming data due to the inherent characteristics of streaming data [80]. Some of the constraints that are naturally imposed on mining algorithms by streaming data include (i) the concept of a single pass, (ii) the probability distribution of data chunk is not known in advance, (iii) no limitation on the amount of generated data, (iv) the size of incoming data may vary, (v) the incoming data may belong to various sub‐clusters, and (vi) access to correct class labels is limited due to overhead incurred by label query for each arriving instance [81]. The constraints further generate other problems, which include: (i) capturing sub‐cluster data within the bounded learning time complexity, (ii) the minimum number of epochs required to achieve the learning time complexity, and (iii) making algorithm robust in the face of dynamically evolving and irregular streaming data.
Different streaming data mining tasks include clustering, similarity search, prediction, classification, and object detection, among others [82, 83]. Algorithms used for streaming data analysis can be grouped into four: Unsupervised learning, semi‐supervised learning, supervised learning, and ontology‐based techniques. These are subsequently described.