Читать книгу Bioinformatics - Группа авторов - Страница 45
Introduction
ОглавлениеOne of the cornerstones of bioinformatics is the process of comparing nucleotide or protein sequences in order to deduce how the sequences are related to one another. Through this type of comparative analysis, one can draw inferences regarding whether two proteins have similar function, contain similar structural motifs, or have a discernible evolutionary relationship. This chapter focuses on pairwise alignments, where two sequences are directly compared, position by position, to deduce these relationships. Another approach, multiple sequence alignment, is used to identify important features common to three or more sequences; this approach, which is often used to predict secondary structure and functional motifs and to identify conserved positions and residues important to both structure and function, is discussed in Chapter 8.
Before entering into any discussion of how relatedness between nucleotide or protein sequences is assessed, two important terms need to be defined: similarity and homology. These terms tend to be used interchangeably when, in fact, they mean quite different things and imply quite different biological relationships.
Similarity is a quantitative measure of how related two sequences are to one another. Similarity is always based on an observable – usually pairwise alignment of two sequences. When two sequences are aligned, one can simply count how many residues line up with one another, and this raw count can then be converted to the most commonly used measure of similarity: percent identity. Measures of similarity are used to quantify changes that occur as two sequences diverge over evolutionary time, considering the effect of substitutions, insertions, or deletions. They can also be used to identify residues that are crucial for maintaining a protein's structure or function. In short, a high percentage of sequence similarity may imply a common evolutionary history or a possible commonality in biological function.
In contrast, homology implies an evolutionary relationship and is the putative conclusion reached based on examining the optimal alignment between two sequences and assessing their similarity. Genes (and their protein products) either are or are not homologous – homology is not measured in degrees or percentages. The concept of homology and the term homolog may apply to two different types of relationships, as follows.
If genes are separated by the event of speciation, they are termed orthologous. Orthologs are direct descendants of a sequence in a common ancestor, and they may have similar domain structure, three-dimensional structure, and biological function. Put simply, orthologs can be thought of as the same gene (or protein) in different species.
If genes within the same species are separated by a genetic duplication event, they are termed paralogous. The examination of paralogs provides insight into how pre-existing genes may have been adapted or co-opted toward providing a new or modified function within a given species.
The concepts of homology, orthology, and paralogy and methods for determining the evolutionary relationships between sequences are covered in much greater detail in Chapter 9.