Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 108

6.3 Supervised Learning

Supervised learning is the type of machine learning that infers function from trained labeled data. The training examples contain a couple of input (vector) and output (supervisory signal). Let data stream S = {…, d_{t − 1}, d_t, d_{t + 1}, …}, where d_t = {x_i, y_i}, x_i is the value set of the ith datum in each attribute and y_i is the class of the instance. Data stream classification aims to train a classifier f : x → y that establishes a mapping relationship between feature vectors and class labels [92].

Supervised learning approaches can be subdivided into two major categories, which are regression and classification. When the class attribute is continuous, it is called regression, but when the class attribute is discrete, it is referred to as classification. Manual labeling is difficult, time‐consuming, and could be very costly [93]. In a streaming scenario with high velocity and volume, label data are very scarce, thus leading to poorly trained classifiers as a result of the constrained measure of labeled data accessible for building the models [94].

Some of the supervised learning algorithms for streaming scenario are grouped as presented in [95] (i) Tree‐based algorithms: OLIN, Ultra‐Fast Forest Tree system (UFFT), Very Fast Decision Tree learner (VFDT), VFDTc, Random Forest, and Vertical Hoeffding Tree, Concept‐adapting Evolutionary Algorithm for Decision Tree (CEVOT); (ii) Rule‐based algorithms: On‐demand classifier, Fuzzy Passive‐aggressive classification, Similarity‐based data stream classification (SimC), Prequential area under curve (AUC) based classifier, one‐class classifier with incremental learning and forgetting, and Classifying recurring concept using fuzzy similarity function; (iii) Ensemble‐based algorithms: Streaming ensemble algorithm, Weighted classifier ensemble, Distance‐based ensemble online classifier with kernel clustering; (iv) Nearest‐neighbor: Adaptive nearest neighbor classification algorithm, anytime nearest neighbor algorithm; (v) Statistical: Evolving Naïve Bayes; (vi) Deep learning: Activity recognition [96].

Computational Statistics in Data Science

Подняться наверх