Читать книгу Bioinformatics - Группа авторов - Страница 62

PSI-BLAST

Оглавление

The variation of the BLAST algorithm known as PSI-BLAST (for position-specific iterated BLAST) is particularly well suited for identifying distantly related proteins – proteins that may not have been found using the traditional BLASTP method (Altschul et al. 1997; Altschul and Koonin 1998). PSI-BLAST relies on the use of position-specific scoring matrices (PSSMs), which are also often called hidden Markov models or profiles (Schneider et al. 1986; Gribskov et al. 1987; Staden 1988; Tatusov et al. 1994; Bücher et al. 1996). PSSMs are, quite simply, a numerical representation of a multiple sequence alignment, much like the multiple sequence alignments that will be discussed in Chapter 8. Embedded within a multiple sequence alignment is intrinsic sequence information that represents the common characteristics of that particular collection of sequences, frequently a protein family. By using a PSSM, one is able to use these embedded, common characteristics to find similarities between sequences with little or no absolute sequence identity, allowing for the identification and analysis of distantly related proteins. PSSMs are constructed by taking a multiple sequence alignment representing a protein family and then asking a series of questions, as follows.

 What residues are seen at each position of the alignment?

 How often does a particular residue appear at each position of the alignment?

 Are there positions that show absolute conservation?

 Can gaps be introduced anywhere in the alignment?

As soon as those questions are answered, the PSSM is constructed, and the numbers in the table now represent the multiple sequence alignment (Figure 3.13). The numbers within the PSSM reflect the probability of any given amino acid occurring at each position. The PSSM numbers also reflect the effect of a conservative or non-conservative substitution at each position in the alignment, much like the PAM or BLOSUM matrices do. This PSSM now can be used for comparison against single sequences, or in an iterative approach where newly found sequences can be incorporated into the original PSSM to find additional sequences that may be of interest.

Bioinformatics

Подняться наверх