Читать книгу Simulation and Analysis of Mathematical Methods in Real-Time Engineering Applications - Группа авторов - Страница 22

1.2.5 kNN is a Case-Based Learning Method

Оглавление

For classification, it holds all the training details. In several applications, such as dynamic web mining for a wide repository, being a lazy learning technique prevents it. One way to enhance its effectiveness is to find some representatives to represent the entire classification training data, viz. Building an inductive learning model from the dataset of training and using this model for classification (representatives). There are several existing algorithms initially built to construct such a model, such as decision trees or neural networks. Their efficiency is one of the assessment benchmarks for various algorithms. Since kNN is a simple but effective classification method and is convincing as one of the most effective methods in text categorization for Reuters corpus of newswire articles, it motivates us to create a model for kNN to boost its efficiency while also maintaining its accuracy of classification [9]. Figure 1.3 shows frequency distribution in statistics is a representation that displays the number of observations within a given interval.

B. Support Vector Machine (SVM)

Support vector machines [7] (SVMs, also support vector networks) analyse knowledge used for classification and regression analysis in machine learning. An SVM training algorithm generates a model that assigns new examples to one category or another, given a set of training examples, each marked as belonging to one or the other of two categories. The picture below has two data forms, red and blue. In kNN, we used the test data to measure its distance to all training samples and to take the minimum distance sample. It takes a lot of time to calculate all the distances and a lot of memory to store all the samples from the training.


Figure 1.3 Distribution of data points and first obtained representative.


Figure 1.4 SVM.

Our primary objective is to find a line that divides the data uniquely into two regions. Such knowledge that can be split into two with a straight line (or high dimension hyperplanes) is called Linear Separable.

In the above Figure 1.4, intuitively, the line should pass as far as possible from all the points as there can be noise in the incoming data. The accuracy of the classification does not affect this data. Thus, the farthest line would offer more immunity against noise. Therefore, SVM discovers a straight line (or hyperplane) with the highest minimum distance from the training samples [10].

Simulation and Analysis of Mathematical Methods in Real-Time Engineering Applications

Подняться наверх