Читать книгу Semantic Web for Effective Healthcare Systems - Группа авторов - Страница 22
1.4.3 Clustering Techniques
ОглавлениеClustering methods identify similar groups of data in a data set collection. Centroid model, the K-Means algorithm, is an iterative clustering algorithm groups all the data point closer to the centroid. It is important to have prior knowledge on the data set, as this algorithm takes the number of clusters as input. It partitions the “n” data points into “k” clusters in which each data point belongs to the cluster with the nearest mean. There are many variations exist in using K-Means algorithm like using Euclidian distance between centroid and the data point, fuzzy C-Means clustering and so on. Like LDA, K-Means is also an unsupervised learning algorithm where the user needs to give the number of clusters required. The only difference is that K-Means produces “k” disjoint clusters whereas LDA assigns a document to a mixture of topics. The problems like synonymy and polysemy can be better resolved with the use of LDA than K-Means algorithm technique.