Читать книгу Bioinformatics - Группа авторов - Страница 68

Running a FASTA Search

Оглавление

The University of Virginia provides a web front-end for issuing FASTA queries. Various protein and nucleotide databases are available, and up to two databases can be selected for use in a single run. From this page, the user can also specify the scoring matrix to be used, gap and extension penalties, and the value for ktup. The default values for ktup are 2 for protein-based searches and 6 for nucleotide-based searches; lowering the value of ktup increases the sensitivity of the run, at the expense of speed. The user can also limit the results returned to particular E values.

The results returned by a FASTA query are in a significantly different format than those returned by BLAST. Consider a FASTA search using the sequence of histone H2B.3 from the highly regenerative cnidarian Hydractinia, one of four novel H2B variants used in place of protamines to compact sperm DNA (KX622131.1; Török et al. 2016), as the query. The first part of the FASTA output resulting from a search using BLOSUM62 as the scoring matrix and Swiss-Prot as the target database is shown in Figure 3.21, summarizing the results as a histogram. The histogram is intended to convey the distribution of all similarity scores computed in the course of this particular search. The first column represents bins of similarity scores, with the scores increasing as one moves down the page. The second column gives the actual number of sequences observed to fall into each one of these bins. This count is also represented by the length of each of the lines in the histogram, with each of the equals signs representing a certain number of sequences; in the figure, each equals sign corresponds to 130 sequences from UniProtKB/Swiss-Prot. The third column of numbers represents how many sequences would be expected to fall into each one of the bins; this is indicated by the asterisks in the histogram. The hit list would immediately follow, and a portion of the hit list for this search is shown in Figure 3.22. Here, the accession number and partial definition line for each hit is given, along with its optimal similarity score (opt), a normalized score (bit), the expectation value (E), percent identity and similarity figures, and the aligned length. Not shown here are the individual alignments of each hit to the original query sequence, which would be found by further scrolling down in the output. In the pairwise alignments, exact matches are indicated by a colon, while conservative substitutions are indicated by a dot.

Bioinformatics

Подняться наверх