Читать книгу Bioinformatics - Группа авторов - Страница 64
Performing a PSI-BLAST Search
ОглавлениеPSI-BLAST searches can be initiated by following the Protein BLAST link on the BLAST landing page (Figure 3.5). The search page shown in Figure 3.14 is identical to the one shown in the BLASTP example discussed earlier in this chapter. Here, the sequence of the human sex-determining protein SRY from UniProtKB/Swiss-Prot (Q05066) will be used as the query, using UniProtKB/Swiss-Prot as the target database and limiting returned results to human sequences. PSI-BLAST is selected in the Program Selection section and, as before, selected changes will be made to the default parameters (Figure 3.15). The maximum number of target sequences has been raised from 500 to 1000, as a safeguard in case a large number of sequences in UniProtKB/Swiss-Prot match the query. In addition, both the E value threshold and the PSI-BLAST threshold have been changed to 0.001, and filtering of low-complexity regions has been enabled. The query can now be issued as before by clicking on the blue “BLAST” button at the bottom of the page.
Figure 3.13 Constructing a position-specific scoring matrix (PSSM). In the upper portion of the figure is a multiple sequence alignment of length 10. Using the criteria described in the text, the PSSM corresponding to this multiple sequence alignment is shown in the lower portion of the figure. Each row of the PSSM corresponds to a column in the multiple sequence alignment. Note that position 8 of the alignment always contains a threonine residue (T), whereas position 10 always contains a glycine (G). Looking at the corresponding scores in the matrix, in row 8, the threonine scores 150 points; in row 10, the glycine also scores 150 points. These are the highest values in the row, corresponding to the fact that the multiple sequence alignment shows absolute conservation at those positions. Now, consider position 9, where most of the sequences have a proline (P) at that position. In row 9 of the PSSM, the proline scores 89 points – still the highest value in the row, but not as high a score as would have been conferred if the proline residue was absolutely conserved across all sequences. The first column of the PSSM provides the deduced consensus sequence.
The results of the first round of the search are shown in Figure 3.16, with 31 sequences found in the first round (at the time of this writing). The structure of the hit list table is exactly as before, now containing two additional columns that are specific to PSI-BLAST. The first shows a column of check boxes that are all selected; this instructs the algorithm to use all the sequences to construct the first PSSM for this particular search. Keeping in mind that the first round of any PSI-BLAST search is simply a BLASTP search and that no PSSM has yet been constructed, the second column is blank. To run the next iteration of PSI-BLAST, simply click the “Go” button at the bottom of this section. At this point, the first PSSM is constructed based on a multiple sequence alignment of the sequences selected for inclusion, and the matrix is now used as the query against Swiss-Prot. The results of this second round are shown in Figure 3.17, with the final two columns indicating which sequences are to be used in constructing the new PSSM for the next round of searches, as well as which sequences were used to build the PSSM for the current round. Also note that a good number of the sequences are highlighted in yellow; here, 26 additional sequences that scored below the PSI-BLAST threshold in the first round have now been pulled into the search results. This provides an excellent example of how PSSMs can be used to discover new relationships during each PSI-BLAST iteration, thereby making it possible to identify additional homologs that may not have been found using the standard BLASTP approach. Of course, the user should always check the E values and percent identities for all returned results before passing them through to the next round, unchecking inclusion boxes as needed. There may also be cases where prior knowledge would argue for removing some of the found sequences based on the descriptors. As with all computational methods, it is always important to keep biology in mind when reviewing the results.
Figure 3.14 Performing a PSI-BLAST search. See text for details.