Читать книгу Data Analytics in Bioinformatics - Группа авторов - Страница 39

2.3.3 Partition Algorithms 2.3.3.1 k-Means Clustering

This algorithm partitions the dataset of n objects into k clusters in the feature space. The value of k (number of clusters) is fixed beforehand. Initially the algorithm randomly assumes k number of cluster centers (cluster mean) in the feature space. The distance (computed using Euclidean distance) of each gene sample is computed from all these cluster means. The sample is then grouped under the cluster that has minimum distance from the sample. The same process is followed for all the samples. Once all the samples are grouped under one of these clusters, each cluster mean is computed again considering the entire samples in that cluster and the cluster mean is updated. The above steps are iterated until the algorithm forms k stable clusters.

The drawback of such algorithm is that determining the clusters (k) prior to implementation reduces the effectiveness of cluster formation and is prone to noisy data in gene expression values [13].

Подняться наверх