Читать книгу Bioinformatics - Группа авторов - Страница 51

Which Matrices Should be Used When?

Although most bioinformatic software will provide users with a default choice of a scoring matrix, the default may not necessarily be the most appropriate choice for the biological question being asked. Table 3.1 is intended to provide some guidance as to the proper selection of scoring matrix, based on studies that have examined the effectiveness of these matrices to detect known biological relationships (Altschul 1991; Henikoff and Henikoff 1993; Wheeler 2003). Note that the numbering schemes for the two matrix families move in opposite directions: more divergent sequences are found using higher numbered PAM matrices and lower numbered BLOSUM matrices. The following equivalencies are useful in relating PAM matrices to BLOSUM matrices (Wheeler 2003):

PAM250 is equivalent to BLOSUM45

PAM160 is equivalent to BLOSUM62

PAM120 is equivalent to BLOSUM80.

In addition to the protein matrices discussed here, there are numerous specialized matrices that are either specific to a particular species, concentrate on particular classes of proteins (e.g. transmembrane proteins), focus on structural substitutions, or use hydrophobicity measures in attempting to assess similarity (see Wheeler 2003). Given this landscape, the most important take-home message for the reader is that no single matrix is the complete answer for all sequence comparisons. A thorough understanding of what each matrix represents is critical to performing proper sequence-based analyses.

Table 3.1 Selecting an appropriate scoring matrix.

Matrix	Best use	Similarity
PAM40	Short alignments that are highly similar	70–90%
PAM160	Detecting members of a protein family	50–60%
PAM250	Longer alignments of more divergent sequences	∼30%
BLOSUM90	Short alignments that are highly similar	70–90%
BLOSUM80	Detecting members of a protein family	50–60%
BLOSUM62	Most effective in finding all potential similarities	30–40%
BLOSUM30	Longer alignments of more divergent sequences	<30%

The Similarity column gives the range of similarities that the matrix is able to best detect (Wheeler 2003).

Подняться наверх