Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 107

6.2 Semi‐Supervised Learning

Semi‐supervised learning belongs to a class of AI frameworks that trains on the combination of both the unlabeled and labeled data [89]. Semi‐supervised learning in data stream context is challenging because data are being generated at real‐time and the labels may be missing due to different factors, which include communication errors, network delays, expensive labeling processes, among others [90]. According to Zhu and Li [91], a semi‐supervised learning problem in a data stream context is defined as follows. Let as the data in the first T₀ time period and S denote streaming data. Let Y = {1, 2, …, K} be the known label set. The arriving data stream has an instance x_t and y_t ∈ Y^′ = {−1, 1, 2, …, K}. If y_t = − 1, x_t is an unlabelled instance, but the true label is in set Y. As time goes on, evolution happens, a data stream which contains novel classes. That is, where but the true label of is not in set Y. Note that if holds forever.

Semi‐supervised learning on streaming data may return similar results to that of the supervised approach. However, there are observations with semi‐supervised learning on streaming data, which include (i) to balance out classifiers, considerably more objects ought to be labeled, and (ii) more significant threshold adversely impacts the strength of classifiers with the increase in standard deviation and a bigger threshold [19]. Some of the semi‐supervised learning techniques for data streams include ensemble techniques, graph‐based methods, deep learning, active learning, linear neighborhood propagation.

Computational Statistics in Data Science

Подняться наверх