Читать книгу Data Analytics in Bioinformatics - Группа авторов - Страница 37

2.3.1 Microarray Analysis

Оглавление

The exploration of genomics is centered upon the study of single genes; contrarily microarray analysis is a technology where thousands of gene expressions and its levels are identified under a microscopic slide upon which chips are placed to collect the data which in turn referred as gene chips or DNA chips [12].

In microarray analysis mRNA molecules are gathered from a reference sample (e.g. sample of diseased patient) and test sample (molecules of any individual). The data is combined using probes and if similarities are identified then they are moved into a cluster. If they are found to be dissimilar, they are moved to another cluster. Hence, the clusters are now labeled based on their similarities [12].

During the microarray analysis the data gathered from microarray images are used to construct matrices (refer to Table 2.1) in which rows hold genes and columns hold different conditions viz. different tissues, clinical condition, different biological processes. Expression level data is maintained in each cell of the matrix (refer to Table 2.1) which is uniquely numbered using every gene in every other sample.

Table 2.1 Gene expression data matrix representation.

Sample 1 Sample 2 Sample 3 Sample 4 Sample m
Gene1 C11 C12 C13 C13 C1m
Gene2 C21 C22 C23 C24 C2m
Gene3 C31 C32 C33 C34 C3m
Gene4 C41 C42 C43 C44 C4m
Gene n Cn1 Cn2 Cn3 Cn4 Cnm

These gene expression matrices are summed up into a database as shown in Figure 2.2 which acts as a repository for all the genetic information like gene regulation, genetics functionality of diseases, drug discovery based upon gene structures, response to drug projection and so on. These databases with all the genomic information when searched for particular instance of genes, it retrieves all the relevant information with high similarities using these clustering algorithms [32].

The sequence of actions for Clustering of Gene Expression (GE), include

Figure 2.2 Example matrix of gene expression (10 genes in a row and 2 samples in columns) [29–31].

 A matrix which hold the data of GE structure which consists of size, dimension, number, origin, etc.

 Features are extracted to identify the similarities (feature extraction). The input data is reduced without loss of information making it more suitable to application is called feature selection or feature reduction. The feature reduction can be achieved using few common algorithms for example Principal Component Analysis [11].

 Based upon similarities or features, all the genes which express similar expression are made into one cluster and which are distinct are grouped into another cluster. The degree of similarity or the proximity levels are computed using certain distance metrics applied on gene expression data provided. Euclidean distance, Manhattan distance, Mahalanobis distance, etc. are few metrics applied to quantify the level of similarity in order to form a cluster.

Data Analytics in Bioinformatics

Подняться наверх