Читать книгу Data Analytics in Bioinformatics - Группа авторов - Страница 37

2.3.1 Microarray Analysis

The exploration of genomics is centered upon the study of single genes; contrarily microarray analysis is a technology where thousands of gene expressions and its levels are identified under a microscopic slide upon which chips are placed to collect the data which in turn referred as gene chips or DNA chips [12].

In microarray analysis mRNA molecules are gathered from a reference sample (e.g. sample of diseased patient) and test sample (molecules of any individual). The data is combined using probes and if similarities are identified then they are moved into a cluster. If they are found to be dissimilar, they are moved to another cluster. Hence, the clusters are now labeled based on their similarities [12].

During the microarray analysis the data gathered from microarray images are used to construct matrices (refer to Table 2.1) in which rows hold genes and columns hold different conditions viz. different tissues, clinical condition, different biological processes. Expression level data is maintained in each cell of the matrix (refer to Table 2.1) which is uniquely numbered using every gene in every other sample.

Table 2.1 Gene expression data matrix representation.

	Sample 1	Sample 2	Sample 3	Sample 4	Sample m
Gene1	C11	C12	C13	C13	C1m
Gene2	C21	C22	C23	C24	C2m
Gene3	C31	C32	C33	C34	C3m
Gene4	C41	C42	C43	C44	C4m
Gene n	Cn1	Cn2	Cn3	Cn4	Cnm

These gene expression matrices are summed up into a database as shown in Figure 2.2 which acts as a repository for all the genetic information like gene regulation, genetics functionality of diseases, drug discovery based upon gene structures, response to drug projection and so on. These databases with all the genomic information when searched for particular instance of genes, it retrieves all the relevant information with high similarities using these clustering algorithms [32].

The sequence of actions for Clustering of Gene Expression (GE), include

Figure 2.2 Example matrix of gene expression (10 genes in a row and 2 samples in columns) [29–31].

A matrix which hold the data of GE structure which consists of size, dimension, number, origin, etc.

Features are extracted to identify the similarities (feature extraction). The input data is reduced without loss of information making it more suitable to application is called feature selection or feature reduction. The feature reduction can be achieved using few common algorithms for example Principal Component Analysis [11].

Based upon similarities or features, all the genes which express similar expression are made into one cluster and which are distinct are grouped into another cluster. The degree of similarity or the proximity levels are computed using certain distance metrics applied on gene expression data provided. Euclidean distance, Manhattan distance, Mahalanobis distance, etc. are few metrics applied to quantify the level of similarity in order to form a cluster.

Подняться наверх