Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 25
Example 2.2
ОглавлениеFigure 2.8 illustrates the NN classification algorithm in two dimensions (features x1 and x2). In the left subpanel, the training examples are shown as blue dots, and a query point that we want to classify is shown as a question mark. In the right subpanel, the class labels are also indicated, and the dashed line indicates the nearest neighbor of the query point, assuming a Euclidean distance metric. The predicted class label is the class label of the closest data point in the training set (here: class 0).
Figure 2.8 Illustration of the nearest neighbor (NN) classification algorithm in two dimensions (features x1 and x2).
More formally, we can define the 1‐NN algorithm as follows:
Training algorithm: for i = 1, n in the n ‐dimensional training data set D ∣ D ∣ = n : •store training example 〈x [i], f(x [i])〉 Prediction algorithm: closest point≔ None closest distance≔∞ •for i = 1, n: ‐current distance≔d(x [i], x [q]) ‐if current distance < closest distance: * closest distance≔ current distance * closest point≔x [i] prediction h(x [q]) is the target value of closest point
Unless noted otherwise, the default distance metric (in the context of this section) of NN algorithms is the Euclidean distance (also called L2 distance), which computes the distance between two points, x[a] and x[b]:
Decision boundary: Assuming a Euclidean distance metric, the decision boundary between any two training examples a and b is a straight line. If a query point is located on the decision boundary, this implies its equidistance from both training example a and b. Although the decision boundary between a pair of points is a straight line, the decision boundary of the NN model on a global level, considering the whole training set, is a set of connected, convex polyhedra. All points within a polyhedron are closest to the training example inside, and all points outside the polyhedron are closer to a different training example. Figure 2.9 illustrates the plane partitioning of a two‐dimensional dataset (features x1 and x2) via linear segments between two training examples (a & b, a & c, and c & d) and the resulting Voronoi diagram (lower‐right corner).
Figure 2.9 Illustration of the plane partitioning of a two‐dimensional dataset.
This partitioning of regions on a plane in 2D is also called a Voronoi diagram or Voronoi tessellation. Given a discrete set of points, a Voronoi diagram can also be obtained by a process known as Delaunay triangulation by connecting the centers of the circumcircles.
Whereas each linear segment is equidistant from two different training examples, a vertex (or node) in the Voronoi diagram is equidistant from three training examples. Then, to draw the decision boundary of a two‐dimensional NN classifier, we take the union of the pair‐wise decision boundaries of instances of the same class. An illustration of the NN decision boundary as the union of the polyhedra of training examples belonging to the same class is shown in Figure 2.10.
k‐Means clustering: In this section, we will cover k‐means clustering and its components. We will look at clustering, why it matters, and its applications, and then dive into k‐means clustering (including how to perform it in Python on a real‐world dataset).
Figure 2.10 Illustration of the nearest neighbor (NN) decision boundary.