Читать книгу Machine Learning Techniques and Analytics for Cloud Security - Группа авторов - Страница 42

2.2 Proposed Methodology

Оглавление

Input: Let, the dataset D consists of “n” number of glycan with “m” number of parameter values like RFU (relative fluorescence units), STDEV (standard deviation), and SEM (squared error mean). Each glycan is a vector and is represented by g1, g2, g3, …, gi, …, gn. The dataset D has two states normal (represented by DN) and diseased or H1N1 infected state (represented by DI).

Output: Differentially expressed glycan identification G’

Step-1: Apply clustering algorithm “C” on normal (represented by DN) and diseased or H1N1 infected state (represented by DI).

Step-2: Result for normal state = ; similarly, result for infected state = ; Here, clusters number is k.

Step-3: Find out the identical clusters or matched clusters between normal states to infected states.

Step-4: Perform cluster comparison and identify the differentially expressed glycan set G that has been changed quite significantly.


Step-5: For multiple glycan datasets D1, D2,…, Dt, the resultant glycan set will be represented as G’= G1G2…Gt; here, G1 is the differentially expressed glycan set obtained in Step 4 for dataset D1.

The entire methodology has been depicted in Figure 2.1. In this paper, three clustering algorithms are used:

The first algorithm has been applied that is the k-means clustering and was proposed by scientist J.B. Macqueen. The actual idea behind this algorithm is to identify k centroids one for each cluster or group.

 (1) At first, choose some points to represent initial cluster focal points.

 (2) Secondly, assign each object to a cluster that has closed centroids.

 (3) Thirdly, when all objects are assigned, then recalculate the position of the k centroids, and lastly, this process will be continued until the centroids no longer move and this basically produces separation of the objects into clusters from which the metric is to be minimized can be calculated [23].

The hierarchical clustering is the second algorithm. It groups similar objects into groups (cluster). In this algorithm, it basically treats every observation as an individual cluster. After that, it iterates the following steps continuously:

 (1) At first, consider the two clusters or groups that are closest together.

 (2) Then, combine the two most similar clusters. Until all the clusters are combined together, this process continues [24].

The fuzzy c-means clustering is the last and third algorithm. This algorithm’s concept is very like to the k-means clustering. The algorithm is as follows:

 (1) At first, identify clusters number.

 (2) Then, randomly assign coefficients to each data point for being in the clusters.

 (3) Until the algorithm has converged, repeats (1) and (2) step:(i) Compute centroid of each cluster or group.(ii) For every data point, compute the coefficient of being in the cluster.

Machine Learning Techniques and Analytics for Cloud Security

Подняться наверх