Читать книгу The Digital Agricultural Revolution - Группа авторов - Страница 24
iv. K-means Clustering
ОглавлениеIt uses categorization to determine the likelihood of a data point belonging to one of two groups based on its proximity to other data points. The first stage in the k-means clustering algorithm is to determine the number of clusters (K) that will be obtained as a final result. The cluster’s centroids are then chosen at random from a set of k items or objects. Based on a distance metric, all remaining items (objects) are assigned to their nearest centroid (mostly Euclidean Distance Metric). The algorithm then calculates the new mean value of each cluster. The term “centroid update” cluster is used to build this stage. Now that the centers have been recalculated, each observation is evaluated once more to see if it is closer to a different cluster. The cluster updated means are used to reassign all of the objects. The cluster assignment and centroid update processes are done iteratively until the cluster assignments do not change anymore (until a convergence criterion is met). That is, the clusters created in the current iteration are identical to those obtained in the prior iteration. Generally, K-means clustering is used in predicting crop yields.
Figure 1.5 Cotton leaf disease using DT algorithm.