Читать книгу An Introduction to Molecular Biotechnology - Группа авторов - Страница 44

4.2 Transcription: From Gene to Protein

Оглавление

Originally, mutation and recombination units were regarded as genes; in the 1950s the “one gene, one protein” hypothesis was developed (DNA makes RNA, which makes proteins). Today, the gene is defined as a transcription unit. In the meantime, the intron/exon structure and the noncoding regulatory sequences, which also belong to the gene, have been recognized. Since mRNAs can be alternatively spliced, the statement “one gene, one protein” is no longer true in the strictest sense. The genetic information flows in all organisms from the gene to the mRNA and to the protein (Figure 4.15). Only retroviruses can reversely translate RNA into DNA using a reverse transcriptase, but in no case has a translation of the amino acid sequence of a protein into a nucleotide sequence been shown.


Figure 4.15 From gene to protein: comparison of prokaryotes and eukaryotes. (a) Simple prokaryotic gene: the mRNA is translated to a protein. (b) Bacterial operon: the primary transcript holds the genetic information for many genes (polycistronic mRNA). In protein biosynthesis, the protein units are synthesized separately. (c) Eukaryotic system: in the nucleus a primary transcript from RNA polymerase II is synthesized from which the intron regions are removed in preceding steps. At the 5′‐end a 7‐methylguanosine cap is added and a poly(A) tail is added to the 3′‐end. The completed mRNA is transported through the nuclear pore complex into the cytosol where it is translated into proteins by the ribosomes. NCS, noncoding sequence.

In eukaryotes, three different RNA polymerases exist, which transcribe DNA into mRNA (plus small nucleolar RNA [snoRNA], miRNA, siRNA, lncRNA, most small nuclear RNA [snRNA]) (RNA polymerase II), ribosomal RNA (rRNA) (RNA polymerase I), or into other functional RNAs (e.g. tRNAs, 5S rRNA, snRNA, other small RNAs; RNA polymerase III). In prokaryotes, only one RNA polymerase is present. The translation of DNA into RNA is termed transcription.

As with replication, in transcription the DNA double helix is locally unwound, so that the RNA polymerase can synthesize the RNA (mRNA, rRNA, tRNA, and other RNAs) complementary to the template DNA strand (Figure 4.16). The DNA strand bearing an identical sequence to the mRNA (except that the T has been replaced with U) is referred to (in a confusing manner) as the coding strand. In addition, the sequence of the coding strand is written in the 5′ → 3′ orientation and is also stored in this format in sequence data banks.


Figure 4.16 Schematic overview of the function of RNA polymerase and transcription.

Coding strand 5′‐ GGC TCC CTA TTA GCA GTC TGC CTC ATG ACC ‐3′
Template strand 3′‐ CCG AGG GAT AAT CGT CAG ACG GAG TAC TGG ‐5′
mRNA 5′‐ GGC UCC CUA UUA GCA GUC UGC CUC AUG ACC ‐3′

The bacterial RNA polymerase is a multienzyme complex containing a removable sigma factor. The sigma factor recognizes promoter regions of genes and assists the RNA polymerase in finding the transcription start. In Escherichia coli, the promoter is made up of two hexamer sequence motives, which are positioned 10 or 35 bases in front of a gene. The consensus sequences are …TTGACA…TATAAT…. Prokaryotic genes are usually organized in the form of operons (Figure 4.15b): genes that belong together, such as those that code for enzymes of a biosynthesis pathway, lie beside one another, and are controlled by a common promoter, which consists of an operator as the control element (Figure 4.17a). Well‐known examples include the Lac operon and tryptophan operon in bacteria, which are regulated by transcriptional activators and transcriptional repressors. These operons are important tools in biotechnology to control the expression of recombinant genes.


Figure 4.17 Simplified schematic illustration of the control of gene expression in prokaryotes and eukaryotes. (a) Bacteria: example tryptophan operon. When the amino acid tryptophan (TRP) is available in excess, the transcription of tryptophan biosynthesis enzymes is then inhibited by a repressor that is activated through the tryptophan, blocking the operator in the promoter. If no tryptophan is available, then the repressor dissociates from operator, and RNA polymerase can begin with transcription (bottom illustration). (b) Eukaryotes: transcription can only begin when an activated protein has bound to the enhancer and the complete transcription factors (Table 4.6) form a transcription complex together with the RNA polymerase II. The connections between the activator protein and the transcription complex are established through a mediator protein, which collaborates with a chromatin remodeling complex (CC) and a histone‐modifying enzyme (H). In addition, proteins are present that dissolve nucleosome complexes so that the DNA is accessible to the RNA polymerase.

Control of gene expression in eukaryotes is very complex. In eukaryotic genomes, there are considerably more genes present than proteins required for a single cell. Therefore, it is necessary to express genes in a cell‐, tissue‐, and development‐specific fashion. This means that out of the estimated 21 000 genes encoding proteins in humans and 9000 noncoding RNA genes, only 30–60% are activated in individual differentiated cells. Research and documentation of differential gene expression patterns is part of the enormous task for current molecular biology. Presently, a new technique, such as RNA‐seq, has been sort of revolution for transcriptomics (Chapter 21).

The transcription of eukaryotic genes (Figure 4.17b) is controlled by neighboring regulatory DNA regions (promoter regions) that are themselves controlled by transcription regulators, which are responsible for the activation or inactivation of a gene. As well as the promoter region that is in close proximity to the coding sequences, further cis‐regulatory elements (enhancer, silencer) can also be positioned further away (Figure 4.17b). The eukaryotic RNA polymerase II is only activated when diverse transcription factors/regulators have bound to the promoter (Figure 4.17b). Table 4.6 reviews the most important control elements and the associated consensus sequences.

Table 4.6 Consensus sequences in eukaryotic promoter regions.

Box Consensus sequence Transcription factor
BRE G/C G/C G/A C G C C TFIIB
TATA T A T A A/T A/A/T TBP
INR C/T C/T A N T/A C/T C/T TFIID
DPE A/G G A/T C G T G TFIID

As most genes in eukaryotic cells are expressed in a cell‐, tissue‐, and development‐specific manner, additional specific transcription regulators play a decisive role. Very many of these factors have not yet been discovered. Apparently, transcription regulators do not work alone but together in complex networks, which include not only transcription factors but also modifications at the DNA (methylation) and chromatin (histone modifications) level. Transcription can also be controlled by various other effectors, such as small noncoding RNAs, lncRNAs, miRNAs, and siRNAs, but also by the speed of RNA transport and degradation.

As opposed to bacteria, eukaryotic protein‐coding genes usually consist of exons(expressed sequences) and introns(intervening sequences) (Figure 4.18) and are therefore referred to as mosaic genes. Exons often encode protein domains; it has been suggested that the exon/intron arrangement has facilitated the emergence of new genes during evolution: In a “modern” gene, exons from several genes have been combined from earlier smaller existing genes.


Figure 4.18 Structure of a eukaryotic gene. NCS, noncoding sequence.

The primary transcript deriving from the transcription is completely processed in the nucleus. It is spliced in a multienzyme complex, the spliceosome, so that each noncoding intron region, which is flanked by GU and AG sequences, is removed. snRNAs are catalytically involved in splicing. The snRNA can be seen as a type of ribozyme (see Section 2.4). In eukaryotes, differential or alternative splicing of the genes is a common theme (Figure 4.19). That is, not all exons will be present in the final mRNA. Due to alternative splicing, a single gene can lead to more proteins (isoforms) depending on the tissue in which they are expressed (this is the reason why the number of proteins in humans is several times higher than the number of genes).


Figure 4.19 Schematic representation of alternative splicing processes. The letters A, B, C, and so on indicate exons. After the complete primary transcript is produced, further selection occurs in the splicing process, in which not all exons remain but a few are removed with the introns. In this way, many different proteins are synthesized from one gene, which differ in domain composition. NCS, noncoding sequence.

The assignment of template or coding strand does not apply for a complete chromosome; the orientation within chromosomes can change from gene to gene, meaning that gene A can be read from the template strand and the neighboring gene B from the strand lying opposite. In eukaryotes, the genes are arranged in a linear manner, one after the other, on chromosomes. In prokaryotes, overlapping genes are found, which are coded for either by the same DNA strand or the complementary DNA strand lying opposite. This results in more dense information but prevents the independent evolution of the DNA sequences.

For the position of the consensus boxes see Figure 4.17.

In eukaryotes, the mRNA is further modified by the addition of a cap structure (to the nascent RNA molecule) at the 5′‐end and a poly(A) tail at the 3′‐end (Figure 4.15). The poly(A) polymerase, which does not require a template, adds around 200 A nucleotides to the 3′‐end. The fully processed mRNA is complexed by several proteins (poly(A)‐binding proteins, nuclear export receptor, hnRNP proteins, CBC, and SR proteins). The mRNA–protein complex is recognized by the nuclear pore complex (NPC) and transported into the cytoplasm (see Chapter 5). Damaged RNA molecules are degraded in the nucleus by the exosome.

In gene regulation, the methylation of cytosine (5‐methylcytosine in plants and animals) and adenine (N6‐methyladenine in prokaryotes) also plays an important role. As a rule, genes that are transcribed are less methylated than genes that are turned off (silent). After each replication, the methylation of the newly replicated DNA strands must take place; an inhibition of the corresponding methyl transferases strongly influences gene expression and cell differentiation. DNA methylation is also important for DNA repair, being that the repair enzymes can recognize a newly constructed and defective DNA strand by the absence of methylation. Methylation and changes in chromatin structure change the expression patterns of genes; these changes are inherited to daughter cells (so‐called genomic imprinting or epigenetic inheritance) (Figure 4.20). Usually, epigenetic changes are not transferred (this is an open debate at present) via the germline to the next generation, whereas mutations in gametes are inherited.


Figure 4.20 Differences between genetic and epigenetic inheritance.

The nucleotide sequence of mRNA is translated using the genetic code into amino acid sequences. tRNA, with its specific anticodon, serves as a mediator between the mRNA and the protein. A central event in the progress in molecular biology was the discovery of the unit‐less, comma‐less, nonoverlapping code in all living organisms. In each case, three nucleotides code for a specific amino acid in each protein (Table 2.4). Using a triplet code with four bases, there are 43 = 64 available combinations. As there are only 20 amino acids that are used to synthesize proteins (Table 2.4), there are more codons than are actually necessary. This problem was solved by evolution in such a way that most of the amino acids are not be coded from only one, but from two to at the most six different synonymous codons (Table 2.4).

The widely universal triplet codehas a specific start signal. Since methionine (in eukaryotes) and N‐formylmethionine (in bacteria and chloroplasts) are the first amino acids to be built into polypeptides, the universal start codon is AUG (far more seldom, GUG is present). In most cases, however, methionine is removed by specific proteases following translation. When the start of the translation shifts only one or two nucleotides, resulting in a shift of the reading frame (frameshift) (Figure 4.13), a totally new protein results. This means that the start codon must be strictly preserved in order to produce reproducible proteins. In animal (but not in plant) mitochondria, there is a deviation from the universal genetic code (e.g. AUA is used for translation initiation and codes for methionine). However, in eukaryotic ribosomes this codon codes for isoleucine; AGG/A is used as a termination codon by vertebrate mitochondria, while it usually codes for arginine. UGA, which is usually a stop codon, codes for tryptophan in animal mtDNA.

Usually the codons that code for the same amino acid differ in the third codon position. Every codon is recognized by tRNA via the anticodon sequence. Within the so‐called degenerate codons that all code for the same amino acid, usually only one tRNA exists, one which tolerates a mismatching in the third codon position. Overall, about 31 tRNAs have been discovered in the eukaryotic system and 22 tRNAs in mitochondria.

An Introduction to Molecular Biotechnology

Подняться наверх