Читать книгу Principles of Microbial Diversity - James W. Brown - Страница 51

Alignment based on conserved structure

In the case of RNAs, however, advanced alignment algorithms (e.g., infeRNAl) can use the secondary structures of the RNAs to align sequences. The ability to use well-defined secondary structures to identify homologous residues (i.e., to align sequences) is one of the key advantages of RNA over protein for phylogenetic analysis. In other words, you can use the secondary structure of the RNA to identify homologous parts of the RNA, rather than relying only on sequence similarity (Fig. 3.7).

Figure 3.8 An RNA alignment based on secondary structure. If residue n (e.g., 24, highlighted) of any sequence pairs to residue m (e.g., 29, also highlighted), then so should the corresponding homologous residues in all sequences. This is an RNA alignment based on secondary structure: stem-loop P3 of RNase P RNA. In this example, the first six rows are not sequences, they are annotations. The first three are just a reference numbering; in this case, the Methanothermobacter thermautotrophicus (Mthermo) sequence is the reference sequence. The row marked “helices” indicates the secondary structure: the 5′ strand of P3 followed by the loop and then the 3′ strand. Each base pair in this stem-loop is indicated by matching right- and left-facing parentheses in the following row and is labeled alphabetically (for human readability) in the subsequent row. doi:10.1128/9781555818517.ch3.f3.8

This works because in general it does not matter (usually) to the RNA what the bases in the helices are; what matters is that opposing bases are complementary so that they can form the helix. As a result, the secondary structure of an RNA is much more highly conserved than its sequence, because coevolution of bases that form base pairs maintains the secondary structure as the sequence changes. Variation in the length of the RNA is usually in hairpin lengthening or shortening. Therefore, it is usually possible to keep track of homologous parts of RNA structures even if the sequences are quite different.

In this type of alignment, the secondary structures of all of the RNAs are directly encoded in the alignment (Fig. 3.8). If residue n (e.g., 24 in Fig. 3.8) of any sequence pairs to residue m (e.g., 29), then so should the corresponding homologous residues in all sequences (Fig. 3.9).

Figure 3.9 RNase P RNA helix “P3” in a variety of Archaea. The base pairs corresponding to the highlighted bases in the sequence alignment in Fig. 3.8 are highlighted. P3 is present in all archaeal (and bacterial) RNase P RNAs, but both the sequence and structure of this helix are highly variable. doi:10.1128/9781555818517.ch3.f3.9

Given this type of alignment, a computer can readily compute any of the RNAs as secondary structures. Inversely, given a preexisting alignment and an RNA sequence with the same secondary structure, a computer algorithm can add this sequence correctly to the alignment. This is what infeRNAl does; it takes a sequence and tries to fold it into the correct secondary structure. If it can do so, it then threads this sequence into the alignment based on this structure.

PROBLEMS

1 1. Align the following two sequences:Now add the following sequence to this alignment:Now add the following sequence to this alignment:
2 2. Align the following sequences:
3 3. Align the following sequences:
4 4. Align the following sequences (note that these are in Fasta format, commonly used for the electronic transfer of sequence data):
5 5. Draw the secondary structures of the sequences in this alignment:
6 6. Create an alignment of the following RNA structures:
7 7. Add the following Seq V RNA structure to the preexisting alignment:

Подняться наверх