Читать книгу Bioinformatics - Группа авторов - Страница 66
FASTA
ОглавлениеWhile the most commonly used technique for detecting similarity between sequences is BLAST, it is not the only heuristic method that can be used to rapidly and accurately compare sequences with one another. In fact, the first widely used program designed for database similarity searching was FASTA (Lipman and Pearson 1985; Pearson and Lipman 1988; Pearson 2000). Like BLAST, FASTA enables the user to rapidly compare a query sequence against large databases, and various versions of the program are available (Table 3.3). In addition to the main implementations, a variety of specialized FASTA versions are available, described in detail in Pearson (2016). An interesting historical note is that the FASTA format for representing nucleotide and protein sequences originated with the development of the FASTA algorithm.
Figure 3.19 Results of a BLAT query. Based on the query submitted in Figure 3.18, the highest scoring hit is to a sequence on chromosome 5 rat genome having 98.1% sequence identity. Clicking on the “details” hyperlink brings the user to additional information on the found sequence, shown in the lower panel. Matching bases in the cDNA and genomic sequences are colored in dark blue and are capitalized. Lighter blue uppercase bases mark the boundaries of aligned regions and often signify splice sites. Gaps are indicated by lowercase black type. In the side-by-side alignment, exact matches are indicated by the vertical line between the sequences.
Table 3.3 Main FASTA algorithms.
Program | Query | Database | Corresponding BLAST Program |
FASTA | Nucleotide | Nucleotide | BLASTN |
Protein | Protein | BLASTP | |
FASTX/FASTY | DNA | Protein | BLASTX |
TFASTYX/TFASTY | Protein | Translated DNA | TBLASTN |