Читать книгу Data Analytics in Bioinformatics - Группа авторов - Страница 34

2.1 Introduction

Machine Learning can be coined as equipping the machine (computers) to learn from the environment through experience by facilitating the machines with some tasks whose performance can be measured using some metrics and algorithms. This broad spectrum of machine learning is subdivided into few areas as mentioned below.

Supervised learning—In the above categories supervised learning is stipulated as learning system where the data (input) is provided and the output is also known which states that output is dependent on the input provided. From the experience of learning from the data provided this approach predicts labels for the newly given data.

Reinforcement Learning—This learning approach drives on a goal oriented approach in an interactive environment, and functions on the basis of feedback system using the cases rewards and punishments based on the interaction with the data and its outcomes.

Unsupervised Learning—This learning approach explores all the hidden patterns from the input provided as the output is unknown. Prediction is performed on the dataset where the algorithms are applied and the resultant outcome is produced [1].

As the biological data is vast because of compound protein structures and genome sequences, understanding and decrypting the function of cells is resilient. So as to study the rudimentary biological processes, machine learning approaches paves a way to make the system hassle free in developing tools, software and algorithms. This chapter dives in introducing the unsupervised learning approaches, algorithms and their practices in bioinformatics domain which is an interdisciplinary field of science grouping together biology, statistics and computer science in order to analyse and assess the huge amounts of biological data [2].

In unsupervised learning approach the machine learns from the dataset given as input and labels or groups data accordingly [1]. This can also be referred as self-organization, where the algorithm applied structures the data based on the input provided with minimum human intervention. This approach draws all the hidden patterns that exist in the data and also reveals the relationship of the patterns present.

Unsupervised learning basically operates on few common algorithms [3]

Clustering

Association

Anomaly detection

Latent variable

Dimensionality reduction.

Figure 2.1 Machine learning in bioinformatics.

Among the above approaches this chapter explores about the algorithmic techniques that are widely applied in bioinformatics paradigm.

Unsupervised learning in bioinformatics—Machine learning in bioinformatics is spread across 6 realms [6] as shown in Figure 2.1.

Genomics and proteomics—the complete set of genes in a cell of an organism is called genome. Genes are structures in which DNA is stored produced from RNA (mRNA-messenger RNA) that is made up from proteins [7]. Every cell of an organism is developed from proteins which are dynamic in nature because every other tissue produces non identical set of proteins. This dynamic nature of proteins is based on the gene expression data. Unlike proteomes, genomes are constant. The set of proteins present in a cell provides insights about the structure and function of a cell [8]. It is difficult to handle gene expression data manually due to its size. Hence machine learning approach such as clustering algorithms are deployed upon varied gene expression data so as to group up similar functions and structures of tissues and explore hidden information.

Подняться наверх