Читать книгу Artificial Intelligence and Data Mining Approaches in Security Frameworks - Группа авторов - Страница 29

2.3 Clustering

A data mining technique is used for grouping a set of objects in such a way that there is more similarity in the objects of the same class as compared to the objects of the other class. It means cluster of same class, i.e., similarity of intra-cluster is maximum and similarity of inter-cluster is minimum. Unsupervised learning can be performed with the help of clustering. Following are the types of clustering algorithms:

1 a) Distribution Based
2 b) Density Based
3 c) Centroid Based
4 d) Connection Based or Hierarchical Clustering
5 e) Recent Clustering Techniques

a) Distribution-Based Clustering

A model of clustering in which the date is grouped/fitted in the model on the basis of probability, i.e., in what way it may fit into the same distribution. Thus, the groups formed will be on the basis of either normal distribution or gaussian distribution

b) Density-Based Clustering

In this type of clustering, a cluster is formed with the help of area with higher density as compared to the rest of the data.

Following are three most frequently used Density-based Clustering techniques:

1 i) Mean-Shift
2 ii) OPTICS
3 iii) DBSCAN

c) Centroid-Based Clustering

Clusters that are represented by a vector are a part of centroid-based clustering. It is not a mandate requirement that these clusters should be a part of the given dataset. The number of clusters is inadequate to size k in k means clustering algorithm; therefore, it is essential to find centres of k cluster and allocate objects to their nearest centres. By taking different values of k random initializations, this algorithm runs multiple times to select the best of multiple runs (Giannotti et al., 2013). In k medoid clustering, clusters are firmly limited to the members of the dataset, whereas in k medians clustering, median is taken to form a cluster; the foremost drawback of these techniques is that we have to select the number of clusters beforehand.

d) Connection-Based (Hierarchical) clustering

As the name itself suggests, this type of clustering is performed on the basis of closeness or distance of objects. The most important key point to form these types of cluster is the distance between the objects by which they can be connected with each other and form clusters. Instead of single partitioning of dataset, these algorithms provide an in-depth hierarchy of merging clusters at particular distances. To represent clusters, a dendrogram is used. Merging distance of the clusters is shown on the y-axis and an object placement shows the x-axis to ensure that there should not be the mixing of clusters.

On the basis of the different ways with which distance is calculated, there are several types of connection-based clusters:

1 i) Single-Linkage Clustering
2 ii) Complete-Linkage
3 iii) Average-Linkage Clustering

e) Recent Clustering Techniques

For high dimensional data, the above-mentioned standard clustering techniques are not fit, therefore some new techniques are being discovered. These new techniques can be classified into two major categories, namely: Subspace Clustering and Correlation Clustering.

A small list of attributes that should be measured for the formation of a cluster is taken into consideration under subspace clustering. Correlation between the chosen attributes can also be performed with correlation clustering.

Artificial Intelligence and Data Mining Approaches in Security Frameworks

Подняться наверх