Читать книгу Astrobiology - Charles S. Cockell - Страница 111

Discussion Point: Why Is There Degeneracy in the Genetic Code?

The degeneracy of the genetic code (genetic degeneracy) results from a simple consideration of the mathematics of the genetic code. As the code uses base pairs, which allow the DNA molecule to be opened down the middle and two identical helices to be synthesized, it necessarily has an even number of bases. Consider a genetic code with only two bases. If it had a codon with three positions, like our own code, it would produce 2 × 2 × 2 = 8 possible amino acids. This is not enough to code for the 20 amino acids required by the life that we know. The only way such a two-base genetics could produce enough codes to have 20 amino acids would be to have five positions on a codon to produce: 2 × 2 × 2 × 2 × 2 = 32 codes, leaving a degeneracy of 10 (assuming that we use one codon as a Start and one as a Stop codon). You can also consider a code with six bases instead of four. If it had only two bases in a codon, it would give 6 × 6 = 36 codes, which is enough to code for the amino acids known in life, with 16 places left over for redundancy. The main point to realize here is that in whatever way we make the genetic code, we end up with either too few codes, or some left over, in other words degeneracy. Another interesting consideration is that a code with only two bases would have a very limited repertoire of coding. The DNA molecule might have to be longer, or there would need to be more of it, to code for the same information in terrestrial life. A greater number of bases than our four bases (such as six or eight) leads to other potential problems, such as a greater frequency in mismatches between bases. It may not be chance that our code has four bases – perhaps it represents a process of biochemical optimization. What do you think?

Some codons code for the instruction to Stop reading and one of them (AUG – a methionine) to Start reading the mRNA strand.

Each amino acid brought to the mRNA in this way forms a peptide bond with the existing chain, and so as new tRNAs bind to the mRNA, a polypeptide or protein is synthesized, with the ribosome continuing to move along the mRNA strand. Thus, the mRNA sequence has been translated into the primary protein sequence. This primary sequence folds together to make the three-dimensional structure of a useful functioning protein.

The sequence of bases that codes for a single protein is called a gene, and we call the entire complement of genes within an organism its genome. The genome size varies enormously between organisms. The human genome contains about 3240 million bases (megabases or Mb; sometimes also written as megabase pairs or Mbp) of DNA, bacteria have up to about 13 Mb of DNA depending on the species, and they typically have about 4000 genes. The smallest genome of a free-living self-replicating organism belongs to Carsonella ruddii, which lives within the psyllids, a family of sap-feeding insects. It has a genome of just 160 000 bases (kilobases or kb or kbp) of DNA and 182 protein-coding genes. The smallest flu virus (which cannot replicate on its own) has only 11 genes. Although genome size is very loosely linked to complexity (bacteria tend to have smaller genomes than animals), this relationship is by no means reliable. Some protozoa (single-celled eukaryotes) have larger genomes than humans. This great difference between the genome sizes of organisms is called the C-value paradox. The “C” refers the quantity of DNA in the genome, which early researchers thought must be related to complexity.

Some of the DNA in an organism is referred to as non-coding DNA, as it has no known translation into protein. The amount of this non-coding DNA varies between species. In bacteria, it can be around 2%, and in humans it is 98.5%. Sometimes called junk DNA, this is a misnomer, since it is becoming increasingly understood that a proportion of this DNA has biochemical functions, for example producing RNA molecules including ribosomes, or encoding viral DNA. Some of the sequences are pseudogenes. These are sequences that code for proteins that are not produced by the cell or are replicas of other genes that are not functional. Much of the so-called C-value paradox is explained by non-coding DNA.

The process of reading DNA to RNA to protein is sometimes called the “central dogma of molecular biology” (Figure 5.13). The word “dogma” is always a troubling word in science, but the overall scheme broadly shows the two fundamental steps of reading the genetic code. The word dogma was used to capture the observation that once genetic information is turned into protein, it cannot go in the reverse direction. The information in protein is not transferred back into nucleic acid in any known life.

Figure 5.13 A summary of the two steps in reading from DNA to RNA to protein.

Подняться наверх