Читать книгу Bioinformatics - Группа авторов - Страница 61

MegaBLAST

MegaBLAST is a variation of the BLASTN algorithm that has been optimized specifically for use in aligning either long or highly similar (>95%) nucleotide sequences and is a method of choice when looking for exact matches in nucleotide databases. The use of a greedy gapped alignment routine (Zhang et al. 2000) allows MegaBLAST to handle longer nucleotide sequences approximately 10 times faster than BLASTN would. MegaBLAST is particularly well suited to finding whether a sequence is part of a larger contig, detecting potential sequencing errors, and for comparing large, similar datasets against each other. The run speeds that are achieved using MegaBLAST come from changing two aspects of the traditional BLASTN routine. First, longer default word lengths are used; in BLASTN, the default word length is 11, whereas MegaBLAST uses a default word length of 28. Second, MegaBLAST uses a non-affine gap penalty scheme, meaning that there is no penalty for opening the gap; there is only a penalty for extending the gap, with a constant charge for each position in the gap. MegaBLAST is capable of accepting batch queries by simply pasting multiple sequences in FASTA format or a list of accession numbers into the query window.

Figure 3.12 Typical output from a BLAST 2 Sequences alignment, based on the query issued in Figure 3.11. The standard graphical view is shown at the top of the figure, here indicating two high-scoring segment pairs (HSPs) for the alignment of the sequences for the transcription factor SOX-1 from human and the ctenophore Mnemiopsis leidyi. The dot matrix view is an alternative view of the alignment, with the query sequence represented on the horizontal axis and the subject sequence represented by the vertical axis; the diagonal indicates the regions of alignment captured within the two HSPs. The detailed alignments are shown at the bottom of the figure, along with the E values and alignment statistics for each HSP.

There is also a variation of MegaBLAST called discontiguous MegaBLAST. This version has been designed for comparing divergent sequences from different organisms, sequences where one would expect there to be low sequence identity. This method uses a discontiguous word approach that is quite different from those used by the rest of the programs in the BLAST suite. Here, rather than looking for query words of a certain length to seed the search, non-consecutive positions are examined over longer sequence segments (Ma et al. 2002). The approach has been shown to find statistically significant alignments even when the degree of similarity between sequences is very low.

Подняться наверх