Читать книгу Bioinformatics - Группа авторов - Страница 59
Suggested BLAST Cut-Offs
ОглавлениеAs was previously alluded to, the listing of a hit in a BLAST report does not automatically mean that the hit is biologically significant. Over time, and based on both the methodical testing and the personal experience of many investigators, many guidelines have been put forward as being appropriate for establishing a boundary that separates meaningful hits from the rest. For nucleotide-based searches, one should look for E values of 10−6 or less and sequence identities of 70% or more. For protein-based searches, one should look for hits with E values of 10−3 or less and sequence identities of 25% or more. Using less-stringent cut-offs risks entry into what is called the “twilight zone,” the low-identity region where any conclusions regarding the relationship between two sequences may be questionable at best (Doolittle 1981, 1989; Vogt et al. 1995; Rost 1999).
The reader is cautioned not to use these cut-offs (or any other set of suggested cut-offs) blindly, particularly in the region right around the dividing line. Users should always keep in mind whether the correct scoring matrix was used. Likewise, they should manually inspect the pairwise alignments and investigate the biology behind any putative homology by reading the literature to convince themselves whether hits on either side of the suggested cut-offs actually make good biological sense.