Читать книгу Principles of Microbial Diversity - James W. Brown - Страница 50

Alignment based on conserved sequence

Оглавление

Most alignments are generated by using computer programs that align sequences from algorithms (e.g., CLUSTAL) that attempt to maximize the similarity (measured in a variety of ways) of all of the sequences. Where the sequences in an alignment are very similar, this approach can generate very good alignments. This is especially true for protein-encoding sequences, with 20 possible amino acids and good scoring matrices to count how similar or different any two amino acids are from each other. This is less true for DNA or RNA sequences, with only four possible bases and where similarity between pairs of bases is less meaningful in the context of the encoded macromolecule.

Very often, however, RNA alignments are either created by hand or at least adjusted manually. Sequences must be fairly similar in sequence and length to be readily aligned by eye or even by computer alignment programs. However, most of the length of SSU rRNAs is highly conserved and can with experience be manually aligned without much trouble.

Figure 3.6 A small window into an alignment of SSU rRNA sequences. doi:10.1128/9781555818517.ch3.f3.6

Some of the tricks to aligning sequences by hand are the following.

 Sequences are often aligned sequentially; start by aligning the two most similar sequences, then add sequences to the alignment one at a time, starting with the sequences most similar to those already aligned and finishing with the most distantly related sequences. Likewise, if you are adding a single sequence to an existing alignment, start by identifying the most similar sequence in the alignment and use that sequence as a guide.

 Alternatively, you can identify conserved blocks of sequence in all of the sequences and align these. You have now broken the alignment problem into smaller, easier chunks. Add gaps as needed to align the space between prealigned chunks according to the criteria below.

 Start by finding patches of very similar sequences and align these, then work out in both directions from these, adding gaps sparingly when needed. Everything after this is about rearranging (and potentially adding or removing) these gaps.

 Where there are sequence differences, slide the gaps around to keep purines (G, A) aligned with purines, and pyrimidines (C, U/T) aligned with pyrimidines.

 Try also to keep differences together in variable sequence positions, and align gaps together in columns wherever possible. A single gap of two positions is a lot better than two separate gaps of one position each.

 Try to keep what look like conserved positions (columns) conserved, and all things being equal, put differences into positions already known to be variable.

Figure 3.7A Comparison of two RNase P RNAs with very different sequences and very similar secondary structures. RNase P RNAs are the catalytic subunits, associated with one or more accessory proteins, that remove the 5′ leaders from tRNA and other RNA precursors. (Adapted from Harris JK, Haas ES, Williams D, Frank DN, Brown JW, RNA 7:220–232, 2001, with permission.) doi:10.1128/9781555818517.ch3.f3.7A

Figure 3.7B doi:10.1128/9781555818517.ch3.f3.7B

Principles of Microbial Diversity

Подняться наверх