Читать книгу Molecular Biotechnology - Bernard R. Glick - Страница 10
Оглавление3
Production of Recombinant Proteins
Protein Production in Prokaryotic Hosts
Increasing Translation Efficiency
Decreasing Protein Degradation
Facilitating Protein Purification
DNA Integration into the Host Chromosome
Heterologous Protein Production in Eukaryotic Cells
Posttranslational Modification of Eukaryotic Proteins
General Features of Eukaryotic Expression Systems
Saccharomyces cerevisiae Expression Systems
Secretion of Heterologous Proteins by S. cerevisiae
Other Yeast Expression Systems
Baculovirus–Insect Cell Expression Systems
Baculovirus Expression Vectors
Mammalian Glycosylation and Processing of Proteins in Insect Cells
Mammalian Cell Expression Systems
Selectable Markers for Mammalian Expression Vectors
Engineering Mammalian Cell Hosts for Enhanced Productivity
Chromosomal Integration and Environment
Site-Directed Mutagenesis by Overlap Extension PCR
Site-Directed Mutagenesis by Inverse PCR
Mutant Proteins with Unusual Amino Acids
Random Insertion/Deletion Mutagenesis
Random Mutagenesis with Degenerate Oligonucleotide Primers
Examples of Protein Engineering
Modifying Cofactor Requirements
Decreasing Protease Sensitivity
FOR MANY BIOTECHNOLOGY APPLICATIONS, a primary objective is to produce high levels of a protein from a cloned gene in a selected host organism. There is no single strategy for obtaining maximal expression of every cloned gene, and many biological parameters must be manipulated to obtain optimal levels of gene expression. These include the genetic elements for controlling transcription, translation, protein stability, and secretion of the product of the cloned gene from the host cell. The level of foreign-gene expression also depends on the host organism. Initially many of the commercially important proteins produced by recombinant DNA technology were synthesized in Escherichia coli. Today many other host systems, such as other bacterial strains, yeasts, and insect and mammalian cells, are employed to produce heterologous proteins. Each of these systems has advantages and disadvantages (Table 3.1). For example, while cloned genes may be expressed at high levels and low cost in E. coli, the proteins produced are not glycosylated. Post translational glycosylation is essential for the function of many human therapeutic proteins, and therefore these proteins are often produced in cultured mammalian cells even though the costs are higher and the yields lower.
Table 3.1 Production of recombinant human proteins in various biological hosts
Protein Production in Prokaryotic Hosts
There are several good reasons to employ prokaryotic cells for production of heterologous proteins, and many bacterial hosts are commercially available such as Gram-negative E. coli and Pseudomonas fluorescens, and Gram-positive Bacillus subtilis, Lactococcus lactis, and Corynebacterium glutamicum. The genetics, molecular biology, biochemistry, and physiology of these bacteria are well understood, they can often be grown to high cell densities in large-scale bioreactors, the growth medium is relatively inexpensive, protocols for their manipulation have been optimized, and vectors are available that carry signals for high levels of gene expression. The latter include the promoter and other transcription regulatory sequences, and sequences that control translation efficiency such as the strength of the ribosome-binding site. Production of a foreign protein in a bacterium may require manipulation of the coding sequence to increase protein stability or direct it to be secreted. To ensure that the cloned gene is maintained in the host cell, it may be necessary to integrate it into the chromosome of the host cell.
Regulation of Transcription
The minimal requirement for an effective gene expression system is the presence of a strong and regulatable promoter sequence upstream from a cloned gene. A strong promoter is one that has high affinity for RNA polymerase, with the consequence that the adjacent downstream region is frequently transcribed (Fig. 3.1). However, a high level of transcription is not always desirable, and the presence of regulatory sequences in the promoter region enables the cell (and the researcher) to control the extent of transcription in a precise manner. Many different promoters with distinctive properties that make them useful for controlling expression have been isolated from a range of organisms (Table 3.2).
Figure 3.1 A strong E. coli promoter resembles the consensus sequences for the −35 and −10 boxes that bind to RNA polymerase. The consensus sequence was determined by aligning many E. coli promoters and identifying conserved nucleotides in the sequences centered −35 and −10 bp upstream of the transcription start site (+1). The distance, but not the nucleotide sequence, between the two boxes is also conserved.
Table 3.2 Promoters commonly used for expression of cloned genes in prokaryotic hosts
A high, constitutive (continuous) level of expression of a cloned gene from a strong, unregulated promoter is often detrimental to the host cell because it creates an energy drain, thereby impairing essential host cell functions and growth (Table 3.3). In addition, all or a portion of the plasmid carrying a constitutively expressed cloned gene may be lost after several division cycles, since cells without a plasmid grow faster and eventually take over the culture. Such plasmid instability is a major problem that may prevent the efficient production of a plasmid-borne gene product on a large scale. To overcome this drawback, it is desirable to use a strong regulatable promoter to control transcription in such a way that a cloned gene is expressed only at a specific stage in the host cell growth cycle and only for a specified duration. The production process is performed in two stages. During the first, or growth, stage, the promoter controlling the transcription of the target gene is turned off, while during the second, or induction, stage, this promoter is turned on.
Table 3.3 Factors that may increase the metabolic burden on a prokaryotic host cell that is expressing high levels of a cloned gene
Widely used, strong, regulatable promoters are those from the E. coli lactose (lac) and tryptophan (trp) operons, and from bacteriophage genes such as the gene 10 promoter from E. coli bacteriophage T7 (Table 3.2). Each of these promoters interacts with regulatory proteins (e.g., repressors or activators), which provide a controllable switch for either turning on or turning off transcription of adjacent cloned genes. The lac and trp promoters are recognized by the major form of the E. coli RNA polymerase holoenzyme. This holoenzyme is formed when a sigma factor protein, in this case the sigma factor RpoD, combines with the core proteins of RNA polymerase. The sigma factor directs the binding of the holoenzyme to promoter regions on the DNA.
The E. coli lac promoter is negatively regulated (turned off) by the LacI repressor protein, which prevents the lac operon from being transcribed in the absence of lactose in the growth medium (Fig. 3.2). Induction (turning on) of the lac promoter is achieved by the addition of either lactose or isopropyl-β-D-thiogalactopyranoside (IPTG), a synthetic inducer, to the medium (Fig. 3.3). In the cell, lactose is converted to allolactose by low levels of β-galactosidase that are synthesized when the system is repressed, before it can act as an inducer. The enzyme β-galactosidase is encoded by the lacZ gene of the lac operon, and it is primarily involved in the cleavage of lactose into glucose and galactose. Both allolactose and IPTG can bind to LacI and prevent it from binding to the lac operator, thereby enabling transcription to occur.
Figure 3.2 Diagrammatic representation of the effects of the concentrations of glucose, lactose, and cAMP in the growth medium on the level of transcription from the E. coli lac promoter. The arrow indicates the direction of transcription. The lac repressor is a tetramer. The cAMP–CAP complex binds to a CAP recognition site (CAP box) on the DNA.
Figure 3.3 Inducers of the lac promoter. (A) Lactose, which must be converted to allolactose to be effective; (B) IPTG.
Transcription from the lac promoter is also positively regulated by the binding of the catabolite activator protein (CAP) (also sometimes referred to as the cyclic AMP [cAMP] repressor protein, or CRP) to a region of the DNA (the CAP box) just upstream of the promoter region (Fig. 3.2). When CAP binds to the CAP box, it increases the affinity of the promoter for RNA polymerase, thereby increasing transcription of the genes downstream from the promoter. The affinity of CAP for its binding site on the DNA is enhanced by its association with cAMP, whose level is high when the amount of glucose in the medium is low. Thus, when inducer (lactose or IPTG) is present and there is no repressor bound to the operator, a high intracellular concentration of cAMP can lead to a high level of transcription of the genes downstream of the lac promoter. In practice, lacUV5, a variant of the lac promoter that contains an altered nucleotide sequence in the −10 region and is a stronger promoter than the wild-type lac promoter, is usually used in plasmid expression vectors. In addition, a mutant form of the lacI gene (lacIq) that produces much higher levels of the lac repressor is often employed to decrease basal (background) levels of transcription (transcriptional leakiness) under noninduced conditions (i.e., transcription of a cloned gene in the absence of inducer).
The promoters of bacteriophage genes drive the production of high levels of viral proteins in a host bacterial cell and are often used in biotechnology applications that require a strong promoter. One such promoter from gene 10 of the E. coli bacteriophage T7 activates very high levels of gene expression. However, this promoter is not recognized by the E. coli RNA polymerase but rather it requires the T7 RNA polymerase for transcription. Therefore, to utilize this promoter for transcription of cloned genes, the T7 RNA polymerase gene is inserted in the E. coli chromosome (Fig. 3.4). To regulate expression of the T7 RNA polymerase gene, it is placed under the control of the E. coli lac promoter and operator. During the growth phase, LacI prevents the synthesis of T7 RNA polymerase and the cloned gene product is not produced. When the culture has reached a suitable cell density, IPTG is added to the medium to induce expression of the T7 RNA polymerase gene. The T7 RNA polymerase binds to the T7 gene 10 promoter and transcribes the cloned gene.
Figure 3.4 Regulation of gene expression controlled by the promoter for gene 10 from bacteriophage T7 (pT7). In the absence of the inducer IPTG, the constitutively produced lac repressor, the product of the lacI gene, which is under the control of the lacI promoter, placI, represses the synthesis of the T7 RNA polymerase that is transcriptionally controlled by the lac operator (olac) and lac promoter (plac). In the absence of T7 RNA polymerase, the target gene, which is under the transcriptional control of the T7 gene 10 promoter (pT7), is not transcribed. When lactose or IPTG is added to the medium, it binds to the lac repressor, thereby preventing it from repressing the transcription of T7 RNA polymerase. In the presence of T7 RNA polymerase, the target gene is transcribed. TT, transcription termination sequence.
Promoters must be chosen carefully, especially for the large-scale production of foreign proteins. Chemical inducers can be costly, toxic, or difficult to remove; thermal induction of promoters may induce the production of heat shock proteins, including proteases; nutrient-responsive promoters limit the types of media that can be used for cell growth and induction; and oxygen-regulated promoters often have significant basal levels of activity as a consequence of the inherent difficulty in precisely controlling dissolved oxygen levels in the growth medium. Promoters that are induced when cells enter the stationary phase of the growth cycle may be useful in the design of expression vectors that are employed for large-scale applications. In addition, E. coli is not necessarily the bacterium of choice for the expression of some foreign proteins. While E. coli promoters can regulate the expression of cloned genes in some other bacteria, different promoters may be required in some host strains.
Increasing Translation Efficiency
Expressing a cloned gene from a strong, regulatable promoter, although essential, may not be sufficient to maximize the yield of the cloned gene product. Other factors, such as the efficiency of translation and the stability of the newly synthesized target protein, may also affect the amount of product. In prokaryotic cells, various proteins are not necessarily synthesized with the same efficiency. In fact, they may be produced at very different levels (up to several hundred-fold) even if they are encoded within the same polycistronic messenger RNA (mRNA). Differences in translational efficiency, as well as differences in transcriptional regulation, enable the cell to have hundreds or even thousands of copies of some proteins and only a few copies of others.
The molecular basis for differential translation is, in part, the sequence of a translational initiation signal called a ribosome-binding site in the transcribed mRNA. A ribosome-binding site is a sequence of 6 to 8 nucleotides (e.g., UAAGGAGG) in mRNA that can base pair with a complementary sequence (AUUCCUCC for E. coli) on the RNA component of the small ribosomal subunit. Generally, the greater the complementarity between the sequences, the stronger the binding of the mRNA to the ribosomal RNA, and the greater the efficiency of translational initiation.
Many E. coli expression vectors have been designed to ensure that the mRNA of a cloned gene contains a strong ribosome-binding site. However, for proper translation of heterologous genes in E. coli, certain other conditions must also be satisfied. First, the ribosome-binding sequence must be located within a short distance (usually 2–20 nucleotides) from the translational start codon of the cloned gene. Second, after transcription, the 5′ end of the mRNA sequence that includes the ribosome-binding site through the first few codons of the gene of interest must not contain nucleotide sequences that have regions of complementarity. Intrastrand base-pairing that leads to the formation of secondary structure in this region may shield the ribosome-binding site (Fig. 3.5) and therefore affect the extent to which the mRNA can bind to the appropriate sequence on the ribosome and initiate translation. Thus, for each cloned gene, it is important to establish that the ribosome-binding site is properly placed and that the secondary structure of the mRNA does not prevent its access to the ribosome.
Figure 3.5 Example of secondary structure of the 5′ end of an mRNA that would prevent efficient translation. The ribosome-binding site is GGGGG, the start codon is AUG (shown in red), and the first few codons are CAG-CAU-GAU-UUA-UUU. The mRNA is oriented with its 5′ end to the left and its 3′ end to the right. Note that in addition to the traditional A · U and G · C base pairs in mRNA, G can also base-pair to some extent with U.
A number of convenient vectors that incorporate both transcriptional and translational signals for the expression of cloned genes in E. coli have been developed. An example is the expression vector pKK233-2 that includes the tac promoter (a hybrid of the strong trp promoter and the regulatable lac promoter; see Milestone box on page 100), the lacZ ribosome-binding site, an ATG start codon located 8 nucleotides downstream from the ribosome-binding site, the transcription terminators T1 and T2 from bacteriophage λ, and an ampicillin resistance gene as a selectable marker (Fig. 3.6). The cloned gene is inserted into an NcoI, PstI, or HindIII site that lies between the ribosome-binding site and the transcription terminators so that it is in the same reading frame as the ATG start codon. After induction with IPTG and transcription, the mRNA of a cloned gene is efficiently translated. However, since the nucleotide sequences that encode the amino acids in the N-terminal region of the target protein vary from one gene to another, it is not possible to design a vector that will eliminate the possibility of mRNA secondary structure in all instances. Therefore, no single optimized translational initiation region can guarantee a high rate of translation initiation for all cloned genes. Consequently, the expression vectors described above are merely starting points for the optimization of translation initiation.
milestone The tac Promoter: a Functional Hybrid Derived from the trp and lac Promoters
De Boer and his colleagues began their efforts to construct the tac promoter with the idea of combining portions of two different strong and regulatable promoters to create an even stronger promoter that would direct very high levels of foreign-gene expression. When they undertook their studies, although the DNA sequences of a number of prokaryotic promoters, mostly from E. coli, were known, the precise features that enabled a promoter to be efficient at directing transcription were not well understood. It was known that almost all mutations that affected the strength of a prokaryotic promoter were found in either the −10 region or the −35 region, which is approximately 10 or 35 bp upstream of the mRNA transcription start site, respectively. Moreover, only mutations that made an existing promoter more like the consensus sequences for each of these regions (i.e., 5′-TATAAT-3′ for the −10 region and 5′-TTGACA-3′ for the −35 region) increased the strength of the promoter. The consensus sequences had been deduced by comparing the DNA sequences of all known promoters and determining which nucleotides occurred most often. de Boer and his colleagues also knew that the lacUV5 promoter, which is a stronger variant of the lac promoter, had a consensus sequence at its −10 but not its −35 region, while the trp promoter, which normally controls the transcription of genes involved in the biosynthesis of tryptophan, has a consensus sequence at its −35 but not its −10 region. They decided to create a fusion promoter that included the −10 region from the lac promoter and the −35 region from the trp promoter. They tested this new “tac” promoter, as they called it, for its ability to direct the synthesis of the enzyme galactose kinase in E. coli and compared it in the same assay system with the trp and lac promoters. In agreement with their initial idea, the tac promoter was found to be approximately 5 times stronger than the trp promoter and 10 times stronger than the lac promoter (de Boer et al., Proc. Natl. Acad. Sci. USA 80:21–25, 1983). In addition, like the lac promoter, the tac promoter was repressed by the lac repressor and derepressed by IPTG. Thus, this new promoter was not only strong, but also regulatable.
Figure 3.6 The expression vector pKK233-2. The plasmid pKK233-2 codes for the ampicillin resistance (Ampr) gene as a selectable marker gene, the tac promoter (ptac), the lacZ ribosome-binding site (rbs), three restriction endonuclease cloning sites (NcoI, PstI, and HindIII), and two transcription termination sequences (T1 and T2). The arrow indicates the direction of transcription. The plasmid is not drawn to scale.
While the genetic code is universal among organisms, the codons that specify amino acids are used to different extents in various organisms (Table 3.4). Cells produce transfer RNAs (tRNAs) corresponding to a specific codon in approximately the same relative amount as that particular codon is used in the production of proteins. When a cloned gene has codons that are rarely used by the host cell, a cellular incompatibility occurs that decreases translation efficiency. For example, AGG, AGA, AUA, CUA, and CGA are the least-used codons in E. coli. The host cell may not produce enough of the tRNAs that recognize these rarely used codons, and consequently the yield of the protein produced from the cloned gene will be much lower than expected. Any codon that is used less than 5% to 10% of the time by the host organism may cause problems. An insufficient supply of tRNAs may also lead to the incorporation of incorrect amino acids into the protein. Errors in the amino acid sequence of a protein may diminish its usefulness if the specific activity and stability are reduced.
Table 3.4 Genetic code and codon usage in E. coli and humans
To alleviate this problem, the target gene may be expressed in a different host, or chemically synthesized (chapter 2) or engineered by directed mutagenesis (described later in this chapter) to contain codons that are more commonly used by the host cell. Codon optimization has enabled the production of large quantities of a variety of heterologous proteins that are otherwise difficult to express in E. coli (Table 3.5). Alternatively, a host cell that has been engineered to overexpress several rare tRNAs is commercially available (Fig. 3.7). The E. coli strain overproduces the tRNAs argU, ileY, and leuW, which are specific for the codons AGG/AGA, AUA, and CUA, respectively. This cell line is used to express a high level of foreign proteins that use these rare E. coli codons. For example, it was possible to overexpress the Ara h2 protein, a peanut allergen, approximately 100-fold over the amount that was synthesized in conventional E. coli cells.
Table 3.5 Increases in gene expression that result from altering the codon usage of the wild-type gene (or cDNA) to more closely correspond to the host E. coli cell
Figure 3.7 Two commercially available plasmids may be used to increase the pool of certain rare tRNAs in E. coli. Plasmid pSJS1244 carries 3 and pRARE carries 10 rare E. coli tRNA genes. p15A represents the replication origins of these plasmids. Spectinomycin and chloramphenicol are the antibiotics for which resistance genes are carried within these plasmids (A). The expression of foreign proteins in a typical E. coli host cell is also shown; the concentration of rare tRNAs is shown schematically (B) and in an E. coli host cell that has been engineered (by introduction of one of the plasmids show in panel A) to overexpress several rare tRNAs (C). Sørensen and Mortensen, J. Biotechnol. 115:113-128, 2005.
Increasing Protein Stability
High levels of expression of some foreign proteins in bacterial hosts often results in the formation of inclusion bodies of insoluble, inactive protein aggregates. Foreign proteins may misfold in a cellular environment in which the pH, osmolarity, and redox status are different from that of the natural host. Misfolding exposes hydrophobic amino acids that leads to aggregation of proteins, especially at high concentrations. Moreover, there are large differences in the intrinsic stabilities of different proteins. Under normal growing conditions, the half-lives of different proteins range from a few minutes to hours. The basis for this differential stability is the extent of disulfide bond formation, the presence of certain amino acids at the N terminus, and the susceptibility to cleavage by proteases.
Facilitating Protein Folding
One simple strategy to increase the amount of recoverable active protein is to cultivate recombinant strains at low temperatures, which facilitates proper protein folding. However, mesophilic bacteria like E. coli grow extremely slowly at low temperatures. In one study, the chaperonin 60 gene (cpn60) and the cochaperonin 10 gene (cpn10) from the psychrophilic bacterium Oleispira antarctica were introduced into a host strain of E. coli with the result that the E. coli strain gained the ability to grow at a high rate at low temperatures (4 to 10°C). This strain was subsequently transformed with a plasmid encoding a target protein, a temperature-sensitive esterase. The expression of the esterase in the E. coli strain carrying the two chaperone genes at 4 to 10°C yielded esterase specific activity that was 180-fold higher than the activity from the native E. coli strain (without chaperonins) grown at 37°C. Although very high levels of expression of the cloned esterase were not attained, this work illustrates an expression system for proteins that are sensitive to high temperature and might otherwise be difficult to produce.
While the psychrophile chaperones enhanced the growth of E. coli at low temperatures, they did not directly participate in proper folding of the foreign protein. It is also possible to coexpress the target gene with one or more molecular chaperones that interact with and mediate correct folding of proteins. E. coli produces several chaperones that function in protein folding. The “folding chaperones” utilize ATP cleavage to promote conformational changes that enable refolding of their substrates (Table 3.6). The “holding chaperones” bind to partially folded proteins until the folding chaperones have done their job. The “disaggregating chaperone” promotes the solubilization of proteins that have become aggregated. Protein folding also involves the “trigger factor,” which binds to nascent polypeptide chains, acting as a holding chaperone. The proper folding of proteins has been facilitated by coexpression with some of these chaperones. In a study in which the chaperones DnaK and GroEL (and their cochaperonin protein molecules) were overexpressed, the yields of several target proteins expressed at the same time were increased up to 5-fold.
Table 3.6 E. coli proteins that facilitate the correct folding of recombinant proteins
Correct disulfide bond formation is essential for many proteins to fold properly and achieve an active configuration (Fig. 3.8). Covalent disulfide bonds form by oxidation of sulfhydryl groups on cysteine amino acids that are adjacent in the folded protein, a reaction that is catalyzed by periplasmic (DsbA and DsbC) and membrane-bound (DsbB and DsbD) enzymes in E. coli. Disulfide bond formation in the reducing environment of the cytoplasm is rare. Thus, foreign proteins that tend to form inclusion bodies may be directed to the periplasm. Overexpression of the disulfide bond isomerase DsbC also promotes disulfide bond formation. Human therapeutic protein tissue plasminogen activator is a 527-amino acid serine protease that requires formation of 17 disulfide bonds to attain an active state. The cDNA for this protein was cloned downstream of a DNA sequence that encodes a signal peptide (described below) to facilitate expression and secretion to the periplasm in E. coli. However, only trace amounts of the protein were produced. Coexpression of high levels of DsbC resulted in more than a 100-fold increase in the production of functional human tissue plasminogen activator. To realize the maximum benefit from DsbC overproduction, it was necessary to induce the synthesis of this protein approximately 30 minutes prior to the induction of human tissue plasminogen activator expression. Alterations in the levels of the other Dsb proteins did not affect the amount of active human tissue plasminogen activator that could be recovered; however, overproduction of all four Dsb proteins yielded the greatest amount of properly folded and active horseradish peroxidase.
Figure 3.8 Disulfide bond in a protein. (A) A covalent, disulfide bond forms by oxidation of sulfhydryl (SH) groups on cysteines. (B) Disulfide bonds between cysteines (represented by brown balls) within a polypeptide (ribbon diagram shown) contribute to the structural stability of the protein.
Another strategy to avoid formation of inclusion bodies is to express the target protein as a fusion protein. Fusion proteins are constructed at the DNA level by ligating a portion of the coding regions of two or more genes, such that a single polypeptide is synthesized. It is essential that the combined coding sequences have the correct reading frame; otherwise an incomplete or an incorrect translation product will result and the protein will not have the desired function. Fusion proteins that contain thioredoxin, a small (12-kilodalton (kDa)), highly soluble protein, as the fusion partner remain soluble even when up to 40% of the cellular protein consists of the fusion protein. With this system, the target gene is cloned just downstream from the thioredoxin gene and both genes are transcribed from a single promoter (Fig. 3.9). Fusion proteins containing thioredoxin accumulate preferentially at the cytoplasmic face of the host E. coli inner membrane at sites known as adhesion zones (regions where the inner and outer membranes adhere). This facilitates selective release of the soluble fusion protein by osmotic shock from E. coli cells into the growth medium. The presence of the host protein segment makes most fusion proteins unsuitable for clinical use and may affect the biological functioning of the target protein. In addition, fusion proteins require more extensive testing before being approved by regulatory agencies, such as the U.S. Food and Drug Administration. Thus, strategies have been developed to remove the unwanted amino acid sequence from the target protein following purification (described below).
Figure 3.9 Expression of a soluble fusion protein. To avoid formation of insoluble inclusion bodies, the gene encoding the target protein is cloned adjacent to, and in the same reading frame as, the thioredoxin gene and expressed as a single polypeptide. ATG encodes the start codon; TGA encodes a stop codon; the arrow indicates the transcription start site.
Decreasing Protein Degradation
Specific amino acids at the N terminus can increase the stability of a protein, probably by making it less susceptible to degradation by cellular proteases. For example, alteration of the N terminal amino acids increased the in vitro survival time of β-galactosidase from approximately 2 minutes to more than 20 hours (Table 3.7). Amino acid additions that extend the intrinsic survival of a protein can be readily incorporated into cloned genes. Often the presence of a single extra amino acid at the N-terminal end is sufficient to stabilize a target protein. Long-lived proteins can accumulate in cells and thereby increase the yield of the product. This phenomenon occurs in both prokaryotes and eukaryotes.
Table 3.7 Stability of β-galactosidase with certain amino acids added to its N terminus
Specific internal amino acid sequences can also make a protein more susceptible to proteolytic degradation. For example, proteins that contain PEST sequences, which are rich in proline (P), glutamic acid (E), serine (S), and threonine (T), and are often flanked by clusters of positively charged amino acids such as lysine and arginine, are marked for degradation within the cell. In some instances, it is possible to enhance the stability of a protein by altering its PEST regions by genetic manipulation. Such changes, of course, must not alter the function of the target protein.
In a number of studies, proteins synthesized from cloned genes have been found to be more resistant to degradation by host cell proteases when they are part of a fusion protein, in which the fusion partner is not especially susceptible to proteolysis. Alternatively, stable foreign proteins can be produced in bacterial host strains that are deficient in the production of proteases. However, this is not as simple as it might appear as E. coli has at least 25 different proteases. Moreover, these proteases are important for the degradation of abnormal or defective proteins, which is a housekeeping function that is necessary for the continued viability of the cells. In one study, strains with mutations in one or more protease genes were constructed. Generally, the strains that were most deficient in overall protease activity grew most slowly. Thus, decreasing protease activity caused cells to be debilitated. However, an E. coli strain with mutations in both the gene for the RNA polymerase sigma factor that is responsible for heat shock protein synthesis (rpoH) and the gene for a protease that is required for cell growth at high temperatures (degP) secreted target proteins that had a 36-fold-greater specific activity than when they were produced in wild-type host cells. This increase in activity reflects a decrease in the proteolytic degradation of these secreted proteins.
Increasing Protein Secretion
E. coli and other Gram-negative bacteria have multiple pathways for the secretion of various proteins. One of these systems (general secretion pathway) uses a membrane-bound protein complex for transmitting a secretory protein through the inner membrane into the periplasm (Fig. 3.10A). Some secretory proteins are then secreted across the outer membrane to the external environment. Directing a foreign protein to the periplasm or the growth medium makes its purification easier and less costly, as many fewer proteins are present outside the cytoplasmic membrane than in the cytoplasm. Moreover, the stability of a cloned protein depends on its cellular location in E. coli. For example, recombinant proinsulin is approximately 10 times more stable if it is exported into the periplasm than if it is localized in the cytoplasm. Secretion of proteins to the periplasm facilitates the correct formation of disulfide bonds because the periplasm provides an oxidative environment, in contrast to the more reducing environment of the cytoplasm. Table 3.8 indicates the amounts of secreted recombinant pharmaceutical proteins attainable in various bacterial systems.
Figure 3.10 Protein secretion in bacteria. (A) Type II secretion pathway in Gram-negative bacteria. The SecB protein binds to a secretory protein in the cytoplasm (1). SecB attaches to the SecA protein that is part of the Sec complex of the inner membrane (2), and the secretory protein is translocated through the inner membrane (3). A signal peptidase removes the signal peptide and the secretory protein is properly folded in the periplasm (4). The secretory protein combines with the protein complex of the general secretory pathway (Gsp) (5) and is translocated across the outer membrane to the external environment (6). (B) Protein secretion in gram-positive bacteria. A signal recognition particle (SRP) binds to the signal peptide of a secretory protein, and this complex binds to a membrane protein that directs the secretory protein (1) to the Sec complex. There is also an SRP-independent pathway (2), where a signal peptide alone makes contact with the Sec complex. The secretory protein is translocated through a channel within the Sec complex (3), and the signal peptide is removed by a signal peptidase. Proper folding of the secretory protein occurs as it passes through the cell wall (4).
Table 3.8 Yields of several secreted recombinant proteins produced in different bacteria
Secretion into the Periplasm
An amino acid sequence called the signal peptide (also called the signal sequence, or leader peptide), located at the N-terminal end of a secretory protein, facilitates its export by enabling the protein to pass through the secretion (Sec) complex in the cell membrane (Fig. 3.10). It is sometimes possible to engineer a protein for secretion to the periplasm by adding the DNA sequence encoding a signal peptide to the cloned gene. When the recombinant protein is secreted into the periplasm, the signal peptide is precisely removed by a signal peptidase that is a component of the secretion apparatus so that the N-terminal end of the target protein is identical to that of the natural protein.
The presence of a signal peptide sequence does not necessarily guarantee a high rate of secretion. When the fusion of a target gene to a DNA fragment encoding a signal peptide is ineffective in producing a secreted protein product, alternative strategies need to be employed. One approach that was found to be successful for the secretion of the protein interleukin-2 (a cytokine that stimulates both T-cell growth and B-cell antibody synthesis) was the fusion of the interleukin-2 gene downstream from the gene for the entire maltose-binding protein, rather than just the maltose-binding protein signal sequence (Fig. 3.11). When this genetic fusion was introduced into E. coli cells on a plasmid vector, a large fraction of the fusion protein was localized in the periplasm. DNA encoding the recognition site for a protease (such as factor Xa) was included between the two genes to release functional interleukin-2 from the fusion protein by digestion with the protease (see below).
Figure 3.11 Engineering the secretion of interleukin-2. (A) Interleukin-2 fused to the E. coli maltose-binding protein (MBP) signal peptide is not secreted. (B) When interleukin-2 is fused to the E. coli maltose-binding protein and its signal peptide, with the two proteins joined by a linker peptide, secretion occurs. Subsequently, the maltose-binding protein and the linker peptide are removed by digestion with factor Xa.
In many instances, when foreign proteins engineered for secretion are overproduced in E. coli, the precursor form is only partially processed, with about half of the secreted proteins retaining the leader peptide and the other half being fully processed to the mature form. This is probably the result of overloading some of the components involved in the secretion process. If this is the case, then it might be possible to increase the ratio of processed to unprocessed proteins by increasing the level of expression of some of the limiting components of the protein secretion pathway. This hypothesis was tested in a series of experiments in which a plasmid containing both the prlA4 and secE genes, which encode major components of the molecular apparatus that physically moves proteins across the membrane, was introduced into E. coli host cells. Following this augmentation of the host cell secretory machinery, the fraction of the cloned protein (in this case, the cytokine interleukin-6) that was secreted to the periplasm as the processed form with the signal peptide removed increased from about 50% to more than 90%.
Secretion into the Medium
The outer membrane of Gram-negative bacteria restricts the secretion of proteins into the surrounding medium. One solution is to produce the secretory protein in a Gram-positive bacterium such as L. lactis which lacks an outer membrane and therefore can secrete proteins directly into the medium (Fig. 3.10B). Alternatively, the Gram-negative bacterium can be modified to secrete proteins directly into the growth medium.
In general, relatively few proteins pass through the outer membrane of E. coli. However, some Gram-negative bacteria can secrete a bacteriocidal protein called a bacteriocin into the medium. A bacteriocin release protein activates phospholipase A, which is present in the bacterial inner membrane and cleaves membrane phosopholipids so that both the inner and outer membranes are permeabilized. Some cytoplasmic and periplasmic proteins are also released into the culture medium. Thus, by expressing the bacteriocin release protein gene from a strong regulatable promoter, E. coli cells may be permeabilized at will. The cells that carry the bacteriocin release protein gene are transformed with another plasmid carrying a cloned gene that has been fused to a secretion signal peptide sequence for transport across the inner membrane. The cloned gene is placed under the same transcriptional-regulatory control as the bacteriocin release protein gene so that the two genes can be induced simultaneously. Once in the periplasm, the target protein is released to the external medium via the permeablized outer membrane (Fig. 3.12).
Figure 3.12 E. coli cells engineered to secrete a foreign protein to the periplasm by fusing the gene of interest (green) to a secretion signal (A) and to the growth medium by permeabilizing cell membranes with a bacteriocin release protein encoded on another plasmid (red) (B).
Although secretion of E. coli proteins to the growth medium is quite rare, the small protein YebF is naturally secreted to the medium without lysing the cells or permeabilizing the membranes. When various proteins are fused to the C-terminal end of YebF, the entire fusion protein is secreted to the medium by an unidentified outer membrane receptor (Fig. 3.13). To date, researchers have reported the secretion to the medium of human interleukin-2 (a 15-kDa hydrophobic protein), bacterial α-amylase (a 48-kDa hydrophilic protein), and alkaline phosphatase (94 kDa), demonstrating that a wide range of proteins may be secreted to the medium using this system. It may be possible to employ other naturally secreted proteins in a similar manner.
Figure 3.13 Secretion, following expression in E. coli, of YebF–interleukin-2 fusion protein into the growth medium. The protein synthesized in the cytoplasm includes a signal peptide (yellow) that is excised when the fusion protein is secreted to the periplasm. The YebF–interleukin-2 fusion protein is then secreted from the periplasm, across the outer membrane, to the growth medium. YebF is shown in blue and interleukin-2 in green.
Facilitating Protein Purification
A number of purification tags have been developed to simplify the purification of recombinant proteins (Table 3.9). The basis for this approach is the expression of a target protein as a fusion protein with a short peptide sequence that has a high affinity for a protein, antibody, carbohydrate, or other ligand; for this reason, the peptide tags are often referred to as affinity tags. For example, the coding sequence for a target protein such as human interleukin-2 is joined to DNA encoding the affinity tag sequence Asp-Tyr-Lys-Asp-Asp-Asp-Asp-Lys (where Asp, Tyr, and Lys indicate aspartic acid, tyrosine, and lysine, respectively) that has the dual function of reducing the degradation of the expressed interleukin-2 and then enabling the product to be purified. Following expression and secretion (during which the signal peptide is removed), the target protein can be purified from the medium in a single step by immunoaffinity chromatography. Monoclonal antibodies that bind to the affinity tag are immobilized on a polypropylene (or other solid) support and specifically capture the fusion protein (Fig. 3.14). In this case, the fusion protein was found to have the same biological activity as native interleukin-2.
Table 3.9 Some protein fusion systems used to facilitate the purification of foreign proteins in E. coli and other host organisms
Figure 3.14 Purification of a fusion protein by immunoaffinity chromatography. (A) An antibody (anti-tag antibody) that binds to a short peptide sequence (affinity tag) of the fusion protein (interleukin-2 in this example) is attached to a solid polypropylene support. The mixture of secreted proteins are passed through the column containing the bound antibody. (B) The affinity tag of the fusion protein binds to the antibody, and the other proteins pass through. The immunopurified fusion protein can then be selectively eluted from the column by the addition of pure affinity tag peptide.
In many instances antigen−antibody complexes that form during the immunoaffinity process are difficult to separate without the use of denaturing chemicals. As an alternative, it has become very popular to express a protein with an affinity tag that contains six or eight histidines attached to either the N- or C-terminal end of the target protein. Following expression and host cell lysis, the histidine-tagged protein, along with other cellular proteins, is then passed over an affinity column of nickel–nitrilotriacetic acid. The histidine-tagged protein, but not the other cellular proteins, binds tightly to the column. The bound protein is eventually eluted from the column by the addition of imidazole (the side chain of histidine). With this protocol, some cloned and overexpressed proteins have been purified up to 100-fold with greater than 90% recovery in a single step. In addition, this system can be utilized to purify denatured proteins, for example, following solubilization of inclusion bodies and before the solubilized proteins are renatured.
While purification tags and other fusion partners may not disrupt the function of a protein, to satisfy the government agencies that regulate the use of pharmaceuticals, it is still necessary to remove these sequences if the product is to be used for human immunotherapy or other medical purposes. One way to do this is to join the gene for the target protein to the DNA sequence for the affinity tag/fusion partner with oligonucleotides that encode short stretches of amino acids that are recognized by a specific nonbacterial protease. For example, a sequence encoding the amino acid sequence isoleucine-glutamic acid-glycine-arginine (Ile-Glu-Gly-Arg) can be inserted between the target and fusion partner sequences. Following synthesis and purification of the fusion protein, factor Xa can be used to release the target protein from the fusion partner, because factor Xa is a specific protease that cleaves peptide bonds uniquely on the C-terminal side of the Ile-Glu-Gly-Arg sequence (Fig. 3.15). Moreover, because this peptide sequence occurs rather infrequently in native proteins, this approach can be used to recover many different cloned gene products. The proteases most commonly used to cleave a fusion partner/affinity tag from a protein of interest are enterokinase, tobacco etch virus protease, thrombin, and factor Xa. However, following this cleavage, it is necessary to perform additional purification steps in order to separate both the protease and the cleaved peptide from the protein of interest.
Figure 3.15 Proteolytic cleavage of a fusion protein by blood coagulation factor Xa. (A) The factor Xa recognition sequence (Xa linker sequence) lies between the amino acid sequences of two different proteins. A functional target protein (with valine at the N terminus) is released after cleavage. (B) A tripartite fusion protein including a stable fusion partner, a Xa linker peptide, and the target protein.
To avoid additional purification steps that are costly and may reduce yields of target protein, self-cleaving purification tags can be used. An intein (intervening protein) is an amino acid segment found in some natural proteins that can excise itself and join the flanking segments (exteins) with a peptide bond. Excision is mediated by a cysteine or serine amino acid at the N-terminal end of the intein sequence, an asparagine at the C-terminal end, and a cysteine, serine or threonine immediately after the asparagine (Fig. 3.16). For affinity tag removal, the target protein can be expressed with an intein-affinity tag sequence at either the N or C terminus. For removal of an N-terminal tag, the codon for the intein N-terminal cysteine/serine is replaced with the codon for alanine to prevent rejoining of the affinity tag and target protein sequences that flank the intein (Fig. 3.17A). Similarly, for cleavage of a C-terminal tag, the codon for the intein C-terminal asparagine is replaced with that for alanine (Fig. 3.17B). Following purification of a target protein, cleavage of the intein-affinity tag is induced by increasing the pH of the solution or by treatment with a thiol reagent such as dithiothreitol or β-mercaptoethanol (Fig. 3.17).
Figure 3.16 Self-splicing proteins. Excision of an intein sequence from a precursor protein is facilitated by specific amino acids at the N- and C-end of the intein. Following excision, the N- and C-exteins are joined to produce the mature protein. C, cysteine; S, serine; N, asparagine; T, threonine. Adapted from Elleuche and Pöggler, 2010. Appl. Microbiol. Biotechnol. 87:479–489.
Figure 3.17 Removal of protein purification tags using inteins. Affinity tags can be removed by inserting the coding sequence for an intein between the coding sequences for the affinity tag and the target protein. Following translation and purification, self-excision of the intein is induced by increasing pH or addition of a thiol reagent such as dithiothreitol. (A) Cleavage of an N-terminal tag. (B) Cleavage of a C-terminal tag. A, alanine; N, asparagine; C, cysteine. Adapted from Fong et al., 2010. Trends Biotechnol. 28:272–279.
DNA Integration into the Host Chromosome
A plasmid vector imposes a metabolic load on the cell because of the energy that is used for its replication and for the transcription of RNA and translation of the proteins that it encodes. As a consequence, a fraction of the cell population often loses its plasmids during cell growth. Cells that lack plasmids generally grow faster than those that retain them, so plasmidless cells eventually dominate the culture. After a number of generations of cell growth, the loss of plasmid-containing cells diminishes the yield of the cloned gene product. On a laboratory scale, plasmid-containing cells are maintained by growing the cells in the presence of either an antibiotic or an essential metabolite that enables only plasmid-bearing cells to thrive. But the addition of either antibiotics or metabolites to pilot plant- or industrial-scale fermentations can be extremely costly, and it is imperative that anything that is added to the fermentation be completely removed before the product is approved for human use. For genetically engineered microorganisms that are designed to be released into the environment to remain both effective and environmentally safe, it is essential that the cloned DNA be retained and be neither easily lost nor transferred to other microorganisms. The introduction of cloned DNA directly into the chromosomal DNA of the host organism can overcome the problem of plasmid loss or transfer. When DNA is part of the host chromosomal DNA, it is relatively stable and consequently can be maintained for many generations in the absence of selective agents.
The chromosomal integration site of a cloned gene must not be within an essential coding gene. Consequently, the input DNA sequence must be targeted to a specific nonessential site within the chromosome. In addition, to ensure efficient production of the target protein, the input gene should be under the control of a regulatable promoter.
Chromosomal integration of the input DNA occurs by homologous recombination. For this, the input DNA must be flanked by sequences that share similarity, at least 50 nucleotides, with the sequence at the integration site. Recombination, mediated by specific host enzymes, results in physical exchange between the two DNA molecules. Briefly, a generalized protocol for DNA integration includes the following steps.
1 Identify the desired chromosomal integration site (i.e., a segment of DNA on the host chromosome that can be disrupted without affecting the normal functions of the cell).
2 Clone the sequence from the chromosomal integration site into a plasmid vector. The chromosomal sequence may be obtained by PCR (described in chapter 2). The plasmid vector must not have an origin of replication that enables it to be replicated in the host bacterium.
3 Ligate a target gene and a regulatable promoter into the cloned chromosomal integration site on the plasmid (Fig. 3.18).
4 Transfer the plasmid containing the chromosomal integration fragment–cloned-gene construct into the host cell, usually by conjugation (described in chapter 2).
5 Select and perpetuate host cells that express the cloned gene. Propagation of the cloned gene can occur only if it has been integrated into the chromosomal DNA of the host cell.
Figure 3.18 Integration of a cloned gene into a chromosomal site. The cloned gene has been inserted, on a plasmid, in the middle of a cloned segment of DNA (ab) from the host chromosome. Homologous DNA pairing occurs between plasmid-borne DNA regions a and b and host chromosome DNA regions a′ and b′, respectively. A double-crossover event (×) results in the integration of the cloned gene.
When a host cell is transformed with a nonreplicating plasmid that carries the cloned gene in the middle of a portion of the cloned chromosomal integration site, the DNA on the plasmid can base-pair with complementary sequences in the homologous region of the host chromosome (Fig. 3.18). The integration occurs as a result of a host enzyme-catalyzed double crossover. Alternatively, a single crossover that incorporates the entire input plasmid into the host chromosome may occur (not shown).
This protocol is exemplified by the integration of the α-amylase gene from Bacillus amyloliquefaciens into the chromosome of B. subtilis. The α-amylase gene, under the control of a B. subtilis promoter, was inserted into the middle of a chromosomal DNA fragment from B. subtilis that had been cloned in an E. coli plasmid. The plasmid carries the bla gene that confers resistance to ampicillin and cannot replicate in B. subtilis. The plasmid can, however, transform B. subtilis. Transformants expressing α-amylase, an enzyme involved in the hydrolysis of starch, were identified by a zone of clearance around colonies that grew on solid medium containing starch. This indicated that the α-amylase gene had been integrated into the B. subtilis chromosome and was expressed (and secreted). The selected transformants were sensitive to ampicillin, indicating that a double crossover had occurred, and that following the exchange of DNA, the plasmid was lost from the population. Resistance to ampicillin would indicate that a single recombination event had occurred and caused the integration of the entire plasmid into the B. subtilis chromosomal DNA. Integration at the correct chromosomal site may be confirmed by PCR using primers that anneal to sequences flanking the integration site.
For cloned genes whose protein products do not have an activity that can be easily detected in a screen for transformants, a two-step procedure may be employed for chromosomal integration (Fig. 3.19). First, a selectable marker gene, usually an antibiotic resistance gene, is inserted into the middle of a nonessential piece of host cell chromosomal DNA that had previously been cloned in a plasmid. Following transformation of a bacterial host with this construct, cells expressing the antibiotic resistance gene are selected by plating on solid medium containing the antibiotic. Because the plasmid cannot replicate in the host cell, antibiotic resistant transformants have integrated the marker gene in the chromosomal DNA by homologous recombination. Next, a target gene with its transcriptional and translational control sequences is inserted into the middle of a cloned fragment of chromosomal DNA with the same sequence that flanks the marker gene and the construct is introduced into the cell, again on a nonreplicatable plasmid. Following transformation with the target gene–plasmid construct, transformants are screened for loss of the antibiotic resistance marker gene. Cells are first plated on medium that lacks the antibiotic and then colonies are replica plated onto medium containing the antibiotic to identify those that have reverted to antibiotic sensitivity. These transformants have exchanged the marker gene for the target gene, which is integrated into the chromosomal DNA.
Figure 3.19 Insertion of a foreign gene into a unique predetermined site on the B. subtilis chromosome. In step 1, a marker gene is integrated into the host cell chromosomal DNA by homologous recombination. In step 2, the selectable marker gene is replaced by the target gene.
When nonessential host genes have not been identified, a cloned gene may be introduced into the host chromosome by means of a single crossover that incorporates the entire input plasmid into the host chromosome (Fig. 3.20). This occurs when the cloned gene is inserted (on a plasmid) adjacent to the cloned chromosomal integration site. In this case, any selectable markers on the plasmid (including antibiotic resistance genes) will also be inserted into the host chromosomal DNA. While the integration of a selectable marker gene along with a gene of interest is helpful for identifying transformed cells, the presence of a selectable marker gene for antibiotic resistance is often undesirable, for example, for organisms that are intended for deliberate release into the environment (e.g., to degrade pollutants). To avoid the problems associated with these approaches, methods for selective removal of marker genes from host cell chromosomes have been developed. An overview of one of these methods is described here.
Figure 3.20 Integration of a plasmid containing a cloned gene into a chromosomal site. The cloned gene is inserted adjacent to the cloned DNA from the host chromosome (c). Homologous DNA pairing occurs between plasmid DNA region c and host chromosome DNA region c′. A single recombination event (×) within the paired c–c′ DNA region results in the integration of the entire plasmid, including the cloned gene.
When a marker gene is flanked by certain short specific DNA sequences and then inserted into either a plasmid or chromosomal DNA, the gene may be excised by a site-specific recombinase that recognizes the flanking DNA sequences and catalyzes recombination that results in excision of the intervening marker sequence (Fig. 3.21). One combination of an enzyme and DNA sequence that is useful for this sort of manipulation is the Cre–loxP recombination system, which consists of the Cre recombinase enzyme and two 34-bp loxP recombination sites. The marker gene to be removed is flanked by loxP sites, and after integration of the plasmid into the chromosomal DNA, the marker gene is removed by the Cre enzyme. A gene encoding the Cre enzyme is located on its own plasmid, which can be introduced into the chromosomally transformed host cells. Marker gene excision is triggered by the addition of IPTG to the growth medium, which turns on the E. coli lac promoter–operator that is present upstream of the cre gene and causes the Cre enzyme to be synthesized. Once there is no longer any need for the Cre enzyme, the plasmid that contains the gene for this enzyme may be removed from the host cells by raising the temperature. This plasmid has a temperature-sensitive replicon that allows it to be maintained in the cell at 30°C but not above 37°C. Given the alarming increase in antibiotic resistant strains of many bacteria, it is essential to avoid the introduction and spread of antibiotic resistant genes in the environment. Removing antibiotic resistant genes from genetically engineered bacteria is an important step in that direction.
Figure 3.21 Removal of a selectable marker gene following integration of plasmid DNA into a bacterial chromosome. A single crossover event (×) occurs between chromosomal DNA and a homologous DNA fragment (hatched) on a plasmid, resulting in the integration of the entire plasmid into the chromosomal DNA. The selectable marker gene, which is flanked by loxP sites, is excised by the action of the Cre enzyme, leaving one loxP site on the integrated plasmid. The Cre enzyme is on a separate plasmid within the same cell under the transcriptional control of the E. coli lac promoter so that excision is induced when IPTG is added to the growth medium.
Heterologous Protein Production in Eukaryotic Cells
Proteins from a wide variety of organisms have been successfully produced using prokaryotic expression systems. Although expression of any gene from any source organism in a prokaryotic host is theoretically possible, in practice, the eukaryotic proteins produced in bacteria do not always have the desired biological activity or stability. In addition, despite careful purification procedures, bacterial compounds that are toxic or that cause a rise in body temperature in humans and animals (pyrogens) may contaminate the final product. To avoid these problems, investigators have developed eukaryotic expression systems in fungal, insect, and mammalian cells for the production of therapeutic agents for either humans or animals; large quantities of stable, biologically active proteins for biochemical, biophysical, and structural studies; and proteins for industrial processes. Moreover, any human protein intended for medical use must be identical to the natural protein in all its properties. The inability of prokaryotic organisms to produce authentic versions of eukaryotic proteins is, for the most part, due to improper posttranslational protein processing, including improper protein cleavage and folding, and to the absence of appropriate mechanisms that add chemical groups to specific amino acid acceptor sites.
Posttranslational Modification of Eukaryotic Proteins
In prokaryotes, the steps in protein synthesis are not compartmentalized, and therefore, translation of mRNA occurs concurrently with transcription; as soon as the nascent transcript emerges from RNA polymerase, it is accessible to the ribosome to begin translation. With the aid of folding proteins, known as chaperones, that bind to polypeptides as they are being synthesized, proteins are folded into their proper three-dimensional configuration during synthesis. In contrast, eukaryotes transport mRNA from the nucleus to ribosomes in the cytoplasm or on the endoplasmic reticulum, where translation occurs. Proteins produced on ribosomes associated with the endoplasmic reticulum either are inserted in the membrane of the endoplasmic reticulum or are secreted into the lumen of the endoplasmic reticulum during synthesis, where they are processed further.
Many proteins, including most of those that are of interest as therapeutic agents for the treatment of human or animal diseases, undergo some type of posttranslational processing that is often required for protein activity and stability. Some proteins are produced as inactive precursor polypeptides that must be cleaved by proteases at specific sites to produce the active form of the protein. For example, the small peptide hormone insulin is produced in animal pancreatic cells as a single polypeptide, preproinsulin, that is cleaved to produce two shorter peptides that are joined by disulfide bonds (Fig. 3.22). Production of inactive preproinsulin ensures that the peptide is not active in the pancreatic cells that produce it, but upon secretion, cleaved mature insulin can act on other cells. Similarly, the digestive enzyme trypsinogen, which degrades proteins, is produced as an inactive polypeptide to avoiding digestion of components of the producing cell. Upon secretion into the small intestines, trypsinogen is cleaved by an enteropeptidase to yield the active enzyme trypsin.
Figure 3.22 Cleavage of inactive preproinsulin to yield active mature insulin. Proteases remove the leader peptide (L) and an internal peptide (C), yielding a peptide that consists of chains A and B.
Similar to prokaryotes, proper folding of proteins in eukaryotic cells requires the assistance of chaperones. In the endoplasmic reticulum, the chaperones BiP and calnexin bind nascent polypeptides, and protein disulfide isomerases catalyze the formation of disulfide bonds between cysteines. Proper folding is important, not only for the protein to attain a configuration for optimal activity, but also to protect regions of the protein that would otherwise be recognized by proteases that can destroy the protein. Quality control systems ensure that only correctly folded proteins are released from the endoplasmic reticulum and transported within vesicles to the Golgi apparatus for further processing. Proteins intended for secretion from the cell are subsequently transported to the cell membrane within specific transport vesicles and released by exocytosis.
The addition of specific sugars (glycosylation) to certain amino acids is a major posttranslational modification that provides stability and distinctive binding properties to a protein. Proper protein glycosylation is important because it contributes to protein conformation by influencing protein folding; can target a protein to a particular location, for example, through interaction with a specific receptor molecule; or can increase protein stability by protecting it from proteases. In the cell, oligosaccharides are attached to newly synthesized proteins in the endoplasmic reticulum and in the Golgi apparatus by specific enzymes known as glycosylases and glycosyltransferases. Different tissues may differentially glycosylate the same protein, thereby increasing protein heterogeneity. Because different sugar modifications can alter the properties of a protein, this presents opportunities for protein engineering to improve the efficacy or to alter the activity of a protein. About 50% of all human proteins are glycosylated. Human therapeutic proteins that require glycosylation for optimal activity include antibodies, blood factors, some interferons, and some hormones.
The most common glycosylations entail the attachment of specific sugars to the hydroxyl group of either serine or threonine (O-linked glycosylation) (Fig. 3.23) and to the amide group of asparagine (N-linked glycosylation) (Fig. 3.24). The initial core sugar groups that are added to these amino acid acceptor sites tend to be similar among eukaryotes, although the subsequent elaborations among yeasts, insects, and mammals are quite diverse, especially for N-linked glycosylation. Other amino acid modifications include phosphorylation, acetylation, sulfation, acylation, γ-carboxylation, and the addition of C14 and C16 fatty acids (i.e., myristoylation [or myristylation] and palmitoylation [or palmitylation], respectively).
Figure 3.23 Examples of some O-linked oligosaccharides in yeasts (A), insects (B), and mammals (C). O-linked oligosaccharides have a number of arrangements with different combinations of sugars. Some of the more prevalent forms are shown here. S, serine; T, threonine; red circles, mannose; dark-blue squares, N-acetylglucosamine; light-blue squares, N-acetylgalactosamine; green squares, galac- tose; orange squares, sialic acid.
Figure 3.24 Examples of some N-linked oligosaccharides in yeasts (A), insects (B), and mammals (C). All N-linked glycosylations in eukaryotes start with the same initial group, which is subsequently trimmed and then elaborated in diverse ways within and among species. Some yeast sites have 15 or fewer mannose units (core series), and others have more (outer-chain family). In S. cerevisiae, the chains frequently have 50 or more mannose units. An asparagine (N) next to any amino acid (X) followed by either threonine (T) or serine (S) can be targeted for glycosylation. Red circles, mannose; dark blue squares, N-acetylglucosamine; yellow triangles, glucose; green squares, galactose; orange squares, sialic acid; maroon triangle, fucose.
Unfortunately, there is no universally effective eukaryotic host cell that performs the correct modifications on every protein. In some cases, a host cell may add unusual sugars to either authentic or spurious amino acid sites and, consequently, create an extremely antigenic protein or possibly one that lacks its proper function. Even though a recombinant protein may fall short of the stringent properties that are required for a therapeutic agent, it may still be useful for either research or industrial purposes. Different eukaryotic expression systems must be tested to determine which one synthesizes the largest amount of a functional recombinant protein. The choice of an expression system depends primarily on the quality of the recombinant protein that is produced, but the yield of product, ease of use, and cost of production and purification are also important considerations.
General Features of Eukaryotic Expression Systems
The basic requirements for expression of a target protein in a eukaryotic host are similar to those required in prokaryotes. Vectors into which the target gene is cloned for delivery into the host cell can be specialized plasmids designed to be maintained in the eukaryotic host, such as the yeast 2μm plasmid; host-specific viruses, such as the insect baculovirus; or artificial chromosomes, such as the yeast artificial chromosome (YAC). The vector must have a eukaryotic promoter that drives the transcription of the cloned gene of interest, eukaryotic transcriptional and translational stop signals, a sequence that enables polyadenylation of the mRNA, and a selectable eukaryotic marker gene (Fig. 3.25). Because recombinant DNA procedures are technically more difficult to carry out with eukaryotic cells, they are typically carried out in prokaryotic cells before transferring the vectors to the eukaryotic host for protein expression. Therefore, most eukaryotic vectors are shuttle vectors with two origins of replication and two separate selectable marker genes; one set of these genes functions in the bacterium E. coli, and the other set functions in the eukaryotic host cell. If a eukaryotic expression vector is to be used as a plasmid (i.e., as extrachromosomal replicating DNA), then it must also have a eukaryotic origin of replication. Alternatively, if the vector is designed for stable integration into the host chromosomal DNA, then it must have a sequence that is complementary to a segment of host chromosomal DNA to facilitate insertion into a chromosomal site.
Figure 3.25 Generalized eukaryotic expression vector. The major features of a eukaryotic expression vector are a eukaryotic transcription unit with a promoter (p), a multiple cloning site (MCS) in which to insert a target gene, and a DNA segment with transcription termination and polyadenylation signals (t); a eukaryotic selectable marker (ESM) gene; an origin of replication that functions in the eukaryotic host cell (orieuk); an origin of replication that functions in E. coli (oriE); and an E. coli selectable marker (e.g., Ampr) gene.
The introduction of DNA into bacterial or fungal cells is called transformation. In these systems, the term describes an inherited change due to the acquisition of exogenous (foreign) DNA. However, in animal cells, transformation refers to changes in the growth properties of cells in culture after they become cancerous. To avoid confusion, the term transfection is used to denote inherited changes in animal cells that are due to the addition of exogenous DNA.
Three techniques are commonly used to transform yeasts: electroporation, lithium acetate treatment, and cell wall removal (protoplast formation). Transfection of cultured animal cells is achieved by incubating cells with DNA that has been coprecipitated with either calcium phosphate or diethylaminoethyl (DEAE)–dextran or by electroporation. Electroporation entails subjecting cells to short pulses of electric current, thus creating transient pores through which DNA enters the cell (Fig. 2.7). Viruses, lipid–DNA complexes, and protein–DNA aggregates are also used to transfer exogenous DNA into a recipient animal cell.
Yeast Expression Systems
Yeasts share many of the molecular, genetic, and biochemical features of other, “higher” eukaryotes and are therefore a good choice for heterologous protein production. They have growth advantages similar to those of prokaryotes, such as rapid growth in low-cost medium; generally do not require growth factors to be added to the growth medium; can correctly process eukaryotic proteins; and can secrete large amounts of heterologous proteins. Initially, the yeast Saccharomyces cerevisiae was used extensively as a host cell for the expression of cloned eukaryotic genes. It has a long history of use in traditional biotechnologies in the brewing and baking industries. Today, a variety of yeast and other fungal expression systems are available, and they have been optimized for recombinant protein expression. Versatile expression vectors with broad host ranges have been constructed because the optimal host for production of a particular target protein must often be determined experimentally in a number of different systems.
High levels of recombinant protein production have been achieved using S. cerevisiae. There are advantages of using this single-celled yeast. First, a great deal is known about its biochemistry, genetics, and cell biology. The genome sequence of S. cerevisiae was completed in 1996, and it is used extensively in studies as a model organism for cell function. Second, it can be grown rapidly to high cell densities on relatively simple media in both small culture vessels and large-scale bioreactors. Third, several strong promoters have been isolated from the yeast and characterized, and a naturally occurring plasmid, called the 2µm plasmid, can be used as part of an endogenous yeast expression vector system. Fourth, S. cerevisiae is capable of carrying out many posttranslational modifications. Fifth, S. cerevisiae normally secretes so few proteins that, when it is engineered for extracellular release of a recombinant protein, the product can be easily purified. Sixth, because of its many years of use in the baking and brewing industries, S. cerevisiae has been listed by the U.S. Food and Drug Administration as a “generally recognized as safe” (GRAS) organism. It does not harbor human pathogens or produce fever-stimulating pyrogens. Therefore, the use of this organism for the production of human therapeutic agents (drugs or pharmaceuticals) does not require the same extensive experimentation demanded for unapproved host cells. A number of proteins that have been produced in S. cerevisiae are currently being used commercially as vaccines, pharmaceuticals, and diagnostic agents (Table 3.10). For example, at present, more than 50% of the world supply of insulin is produced by S. cerevisiae. Engineered S. cerevisiae strains are also major producers of a hepatitis B vaccine.
Table 3.10 Recombinant proteins produced by S. cerevisiae expression systems
Vaccines Hepatitis B virus surface antigen Malaria circumsporozoite protein HIV-1 envelope protein |
Diagnostics Hepatitis C virus protein HIV-1 antigens Human therapeutic agents |
Epidermal growth factor Insulin Insulin-like growth factor Platelet-derived growth factor Proinsulin Fibroblast growth factor Granulocyte-macrophage colony- stimulating factor α1-Antitrypsin Blood coagulation factor XIIIa Hirudin Human growth factor Human serum albumin |
HIV-1, human immunodeficiency virus type 1. |
Saccharomyces cerevisiae Expression Vectors
There are three main classes of S. cerevisiae expression vectors: episomal, or plasmid, vectors, integrating vectors, and YACs. Of these, episomal plasmid vectors have been used extensively for the production of either intra- or extracellular heterologous proteins. Typically, the vectors contain features that allow them to function in both bacteria and S. cerevisiae. An E. coli origin of replication and bacterial antibiotic resistance genes are usually included on the vector, enabling all manipulations to first be performed in E. coli before the vector is transferred to S. cerevisiae for expression.
The yeast episomal plasmid vectors are based on the high-copy-number 2μm plasmid, a small, circular plasmid found in most natural strains of S. cerevisiae. The vector replicates independently of the host chromosome via a single origin of replication (autonomous replicating sequence [ARS]), and is maintained in more than 30 copies per cell. Many S. cerevisiae selection schemes rely on mutant host strains that require a particular amino acid (histidine, tryptophan, or leucine) or nucleotide (uracil) for growth. Such strains are said to be auxotrophic because minimal growth medium must be supplemented with a specific nutrient. In practice, the vector is equipped with a functional (wild-type) version of a gene that complements the mutated gene in the host strain. For example, when a plasmid vector with a wild-type LEU2 gene is transformed into a mutant leu2 host cell and plated onto medium that lacks leucine, only cells that carry the plasmid will grow.
A number of promoters derived from S. cerevisiae genes are available for engineering efficient transcription of heterologous genes in yeast vectors (Table 3.11). Generally, tightly regulatable, inducible promoters are preferred for producing large amounts of recombinant protein at a specific time during large-scale growth. In this context, the galactose-regulated promoters respond rapidly to the addition of galactose with a 1,000-fold increase in transcription. Repressible, constitutive, and hybrid promoters that combine the features of different promoters are also available. Maximal expression also depends on efficient termination of transcription. Often, for plasmid vectors, the terminator sequence is from the same gene as the promoter.
Table 3.11 Promoters for S. cerevisiae expression vectors
Many heterologous genes are provided with a DNA coding sequence for a signal peptide that facilitates the passage of the recombinant protein through cell membranes and its release to the external environment. The main reason for this modification is that it is much easier to purify a secreted protein than one from a cell lysate. The most commonly used signal sequence for S. cerevisiae is derived from the yeast mating factor α gene. Also, synthetic signal sequences have been created to increase the amount of secreted protein. Other sequences that stabilize the recombinant protein, protect it from proteolytic degradation, and provide a purification tag can be fused onto the coding sequence of the heterologous gene. These extra amino acid sequences are equipped with a cleavage site so that they can be removed from the recombinant protein after it is purified.
Plasmid-based yeast expression systems are often unstable under large-scale (≥10 liters) growth conditions even in the presence of selection pressure. To remedy this problem, a heterologous gene is integrated into the host genome to provide a more reliable production system. Different approaches have been devised for the integration of a cloned gene together with a selectable marker gene into an S. cerevisiae chromosome. Briefly, a functional selectable marker gene and a heterologous gene equipped with yeast-specific transcription and translation control sequences are inserted between two DNA segments derived from the ends of a nonessential yeast gene. In this instance, the integrating plasmid does not usually carry an origin of replication that functions in yeast cells. The plasmid is cleaved, and the linear fragment is transformed into S. cerevisiae. A double recombination event between homologous sequences on the linearized plasmid and a chromosome in the host inserts the piece of DNA with both target and marker genes into a specific chromosome site (Fig. 3.26). The plasmid DNA is linearized because DNA in this form is more likely than circular DNA to recombine with chromosome DNA. The DNA that is not integrated is lost during successive cell divisions. The major drawback of this strategy is the low yield of recombinant protein from a single gene copy.
Figure 3.26 Integration of DNA into a S. cerevisiae chromosome. A selectable marker gene (LEU2) and a gene of interest (GOI) with transcription and translation control elements (not shown) are inserted into a yeast integrating plasmid between two segments (A1 and A2) from the ends of a nonessential yeast gene A. The ampicillin resistance (Ampr) gene and the origin of replication (oriE) function in E. coli. A leucine-requiring (leu2) yeast strain is transformed with restriction endonuclease-digested vector DNA because chromosomal DNA is more likely to recombine with linearized DNA than with circular DNA. The restriction endonuclease (RE) sites flank the segments from the nonessential gene. The DNA sequences at the ends of nonessential gene A undergo recombination (×) that leads to the incorporation of both the gene of interest and the LEU2 gene into the corresponding chromosome site. Transformants grow on medium that is not supplemented with leucine. Nonrecombined DNA is degraded.
A YAC is designed to clone a large segment of DNA (100 kilobase pairs [kb]), which is then maintained as a separate chromosome in the host yeast cell. The YAC system is highly stable and has been used for the physical mapping of human genomic DNA, the analysis of large transcription units, and the formation of genomic libraries containing DNA from individual human chromosomes. A YAC vector mimics a chromosome because it has a sequence that acts as an origin of DNA replication (ARS), a yeast centromere sequence to ensure that during cell division each daughter cell receives a copy of the YAC, and telomere sequences that are present at both ends after linearization of the YAC DNA for stability (Fig. 3.27). In some cases, the input DNA is cloned into a site that disrupts a yeast marker gene. In the absence of the product of the marker gene, a colorimetric response is observed when recipient cells are grown on a specialized medium. Alternatively, some YAC vectors contain a selectable marker gene that is independent of the cloning site.
Figure 3.27 YAC cloning system. The YAC plasmid (pYAC) has an E. coli selectable marker (Ampr) gene; an origin of replication that functions in E. coli (oriE); and yeast DNA sequences, including URA3, CEN, TRP1, and ARS. CEN provides centromere function, ARS is a yeast autonomous replicating sequence that is equivalent to a yeast origin of replication, URA3 is a functional gene of the uracil biosynthesis pathway, and TRP1 is a functional gene of the tryptophan biosynthesis pathway. The T regions are yeast chromosome telomeric sequences. The SmaI site is the cloning insertion site. pYAC is first treated with SmaI, BamHI, and alkaline phosphatase and then ligated with size-fractionated (100-kb) input DNA. The final construct carries cloned DNA and can be stably maintained in double-mutant ura3 and trp1 cells.
Secretion of Heterologous Proteins by S. cerevisiae
All glycosylated proteins of S. cerevisiae are secreted. Consequently, the coding sequences of recombinant proteins that require either O-linked or N-linked sugars for biological activity must be equipped with a signal peptide to pass through the secretory system (Fig. 3.28). Usually, the signal sequence from the yeast mating type α-factor gene (prepro-α-factor) is inserted immediately in front (upstream) of the cDNA of the gene of interest. Under these conditions, correct disulfide bond formation, proteolytic removal of the signal sequence, and appropriate posttranslational modifications often occur, and an active recombinant protein is secreted. During this process, the signal peptide is removed by an endoprotease (signal peptidase) that recognizes the dipeptide lysine-arginine (Lys-Arg). The Lys-Arg codons must be located adjacent to the cDNA sequence so that, following removal of the leader peptide, the recombinant protein will have the correct amino acid at its N terminus. For example, a properly processed and active form of the protein hirudin was synthesized and secreted by an S. cerevisiae strain containing a plasmid vector that had the prepro-α-factor sequence added to the hirudin coding sequence. The gene for hirudin is from an invertebrate, the leech Hirudo medicinalis. This protein is a powerful blood anticoagulant that is not immunogenic in humans.
Figure 3.28 Protein secretion pathway in eukaryotes. (A) A signal recognition particle (SRP) binds to the signal sequence of a secretory protein. (B) The SRP attaches to a SRP receptor on the endoplasmic reticulum (ER) membrane. (C) The secretory protein is translocated into the lumen of the ER, and a signal peptidase removes the signal sequence. (D) The secretory protein is folded, partially modified, and packaged in a transport vesicle intended for the Golgi network. (E) The ER-released vesicle carrying the secretory protein enters the Golgi network at the cis face and passes through the Golgi stack where it is further modified. After it is sorted, a plasma membrane-specific vesicle is formed at the trans face of the Golgi network. The secretory transport vesicle fuses with the plasma membrane and releases the secretory protein to the extracellular environment.
Overexpression of proteins tends to result in the formation of undesirable intracellular aggregates of the proteins, rather than their secretion into the medium, which facilitates purification. Major problems that must be addressed to increase heterologous-protein secretion in yeast cells are the incorrect folding of the polypeptide, the activation of cellular mechanisms to cope with the stress of protein overproduction, and the aberrant processing and release of the protein of interest from the endoplasmic reticulum. Correct protein folding occurs in the endoplasmic reticulum in eukaryotes and is facilitated by a number of different proteins, including molecular chaperones, enzymes that promote disulfide bond formation, signal transduction proteins that monitor the demand and capacity of the protein-folding machinery, and proteases that clear away improperly folded or aggregated proteins (Fig. 3.29). The eukaryotic enzyme protein disulfide isomerase is instrumental in forming the correct disulfide bonds within a protein. Aberrant disulfide bond formation changes a protein’s configuration, which abolishes protein activity and causes instability. Poor yields of overexpressed proteins often occur because the capacity of the cell to properly fold and secrete proteins has been exceeded.
Figure 3.29 Summary of protein folding in the endoplasmic reticulum of yeast cells. During synthesis on ribosomes associated with the endoplasmic reticulum (ER), nascent proteins are bound by the chaperones BiP and calnexin, which aid in the correct folding of the protein. Protein disulfide isomerases (PDI) catalyze the formation of disulfide bonds between cysteine amino acids that are nearby in the folded protein. Quality control systems ensure that only correctly folded proteins are released from the ER. Proteins released from the ER are transported to the Golgi apparatus for further processing. Prolonged binding of BiP to misfolded proteins leads to activation of the S. cerevisiae transcription factor Hac1, which controls the expression of several proteins that mediate the unfolded-protein response (UPR). Adapted from Gasser et al., Microb. Cell Fact. 7:11–29, 2008.
Several strategies have been implemented to increase the host cell’s capacity to process higher than normal levels of proteins. The overproduction of molecular chaperones and protein disulfide isomerases may increase the yield of recombinant proteins, especially those with disulfide bonds. To test this hypothesis, the yeast protein disulfide isomerase gene was cloned between the constitutive glyceraldehyde phosphate dehydrogenase promoter and a transcription terminator sequence in a yeast integrating vector, and the entire construct was integrated into a chromosomal site. The modified strain showed a 16-fold increase in protein disulfide isomerase production compared with the wild-type strain. When protein disulfide isomerase-overproducing cells were transformed with a plasmid vector carrying the gene for human platelet-derived growth factor B, there was a 10-fold increase in the secretion of recombinant protein over that of transformed cells with normal levels of protein disulfide isomerase. The overproduction of protein disulfide isomerase specifically increases the secretion of proteins with disulfide bonds. Higher levels of secreted products were also obtained for the recombinant proteins human erythropoietin, bovine prochymosin, and leech hirudin in S. cerevisiae cells that overexpress the chaperone BiP.
Overexpression of the molecular chaperone BiP or protein disulfide isomerase increased the secretion of some heterologous protein; however, overexpression of a single chaperone may not have the desired outcome and, in some instances, may increase the degradation of the target protein. This is because proper protein folding requires the coordinated efforts of many interacting factors (Fig. 3.29). Even when levels of one chaperone are adequate, the levels of cochaperones or cofactors may be limiting. The unfolded-protein response of yeast cells coordinates the expression of several chaperones, as well as cochaperones. When the demand for protein folding exceeds the folding capacity of the endoplasmic reticulum, the unfolded-protein response increases the expression of chaperones, protein disulfide isomerase, and other proteins involved in protein secretion. Engineering the proteins of the unfolded-protein response may be a better approach to increase the overall capacity of the cell to fold proteins in a coordinated way by maintaining appropriate ratios of all factors required. Accumulation of unfolded proteins in the endoplasmic reticulum activates the S. cerevisiae transcription factor Hac1, which activates the expression of proteins of the unfolded-protein response, and therefore expression of Hac1 was targeted for genetic manipulation. Overexpression of Hac1 in S. cerevisiae improved secretion of the important industrial enzyme α-amylase, which is used for starch hydrolysis in a wide range of processes.
Other Yeast Expression Systems
Recombinant proteins have been produced successfully in S. cerevisiae from cloned genes from many sources. However, in many cases, expression levels are low and protein yields are modest. One of the major drawbacks of using S. cerevisiae is the tendency for the yeast to hyperglycosylate heterologous proteins by adding 50 to 150 mannose residues in N-linked oligosaccharide side chains that often alter protein function. Although the initial stages of addition of glycan chains to proteins in the lumen of the endoplasmic reticulum are similar in yeast and humans, following transfer of the protein to the Golgi apparatus, further processing differs significantly. The outcome is the production of a sialylated protein in humans and a hypermannosylated protein in yeast, with α-1,3 bonds between the sugar residues that can make the heterologous protein antigenic in humans (Fig. 3.24). Also, proteins that are designed for secretion frequently are retained in the periplasmic space, increasing the time and cost of purification. Finally, S. cerevisiae produces ethanol at high cell densities, which is toxic to the cells and, as a consequence, lowers the quantity of secreted protein. For these reasons, researchers have examined other yeast species and eukaryotic cells that could act as effective host cells for recombinant protein production.
Pichi pastoris is a methylotrophic yeast that is able to utilize methanol as a source of energy and carbon. It is an attractive host for recombinant protein production because glycosylation occurs to a lesser extent than in S. cerevisiae and the linkages between sugar residues are of the α-1,2 type, which are not allergenic to humans. With these natural characteristics as a starting point, a P. pastoris strain was extensively engineered with the aim of developing a “humanized” strain that glycosylates proteins in a manner identical to that of human cells. Both human and yeast cells add the same small (10-residue), branched oligosaccharide to nascent proteins in the endoplasmic reticulum (Fig. 3.30). However, this is the last common precursor between the two cell types, because once the protein is transported to the Golgi apparatus, further processing is different. In the Golgi apparatus, yeast cells add an α-1,6 mannose residue to the oligosaccharide, which subsequently leads to hypermannosylation. Mammalian cells, on the other hand, remove some mannose residues from the precursor (trimming) and then sequentially add specific sugars to yield a glycoprotein with an oligosaccharide that terminates in sialic acid. To create a “humanized” strain, the enzyme responsible for addition of the α-1,6 mannose was first eliminated from P. pastoris to prevent hypermannosylation. Next, the gene encoding a mannose-trimming enzyme (a mannosidase) from the filamentous fungus Trichoderma reesei was inserted into the yeast genome and was found to trim the oligosaccharide to a human-like precursor. Genes encoding enzymes for the sequential addition of sugar residues that terminate the oligosaccharide chains in galactose were also added. It should be noted that the coding sequences for all engineered genes contained a secretion signal for localization of the encoded protein to the Golgi apparatus. Finally, several genes for proteins that catalyze the synthesis, transport to the Golgi apparatus, and addition of sialic acid to the terminal galactose on the protein precursor were inserted into the P. pastoris genome. Properly sialylated recombinant proteins, including erythropoietin and antibodies, that can be used as human therapeutic agents have been produced by the “humanized” P. pastoris.
Figure 3.30 Differential processing of glycoproteins in P. pastoris, humans, and “humanized” P. pastoris. Initial additions of sugar residues to glycoproteins in the endoplasmic reticulum are similar in human and P. pastoris cells (A). However, further N-glycosylation in the Golgi apparatus differs significantly between the two cell types. N-glycans are hypermannosylated in P. pastoris (B), while in humans, mannose residues are trimmed and specific sugars are added, leading to termination of the oligosaccharide in sialic acid (C). P. pastoris cells have been engineered to produce enzymes that process glycoproteins in a manner similar to that of human cells. In “humanized” P. pastoris, a recombinant glycoprotein produced in the endoplasmic reticulum (D) is transported to the Golgi apparatus, where it is further processed to yield a properly sialylated glycoprotein (E). Blue squares, N-acetylglucosamine; red circles, mannose; green squares, galactose; orange squares, sialic acid. Adapted from Hamilton and Gerngross, Curr. Opin. Biotechnol. 18:387–392, 2007.
During growth on methanol, enzymes required for catabolism of this substrate are expressed at very high levels with alcohol oxidase, the first enzyme in the methanol utilization pathway, encoded by the gene AOX1, representing as much as 30% of the cellular protein. Transcription of AOX1 is tightly regulated; in the absence of methanol, the AOX1 gene is completely turned off but responds rapidly to the addition of methanol to the medium. Therefore, the AOX1 promoter is an excellent candidate for producing large amounts of recombinant protein under controlled conditions. Moreover, the induction of the cloned gene can be timed to maximize recombinant protein production during large-scale fermentations. In contrast to S. cerevisiae, P. pastoris does not synthesize ethanol; therefore, very high cell densities of P. pastoris are attained, with the concomitant secretion of large quantities of protein. In addition, P. pastoris normally secretes very few proteins, thus simplifying the purification of secreted recombinant proteins.
Many P. pastoris expression vectors have been devised, each one having more or less the same format. The basic features include a gene of interest under the control of promoter and transcription termination sequences from the P. pastoris AOX1 gene, an E. coli origin of replication and selectable marker gene, and a yeast selectable marker gene (Fig. 3.31). The addition of a signal sequence from either the P. pastoris phosphatase PHO1 gene or another yeast gene facilitates the secretion of a recombinant protein. To avoid the problems of plasmid instability during long-term growth, most P. pastoris vectors are designed to be integrated into the host genome, often within the AOX1 gene or the HIS4 gene for histidine biosynthesis. Both the engineered gene of interest and a yeast selectable marker gene are inserted together into a specific chromosome site by either a single (Fig. 3.32A) or a double (Fig. 3.32B) recombination event. The P. pastoris expression system has been used to produce more than 100 different biologically active proteins from bacteria, fungi, invertebrates, plants, and mammals, including humans. Many of these proteins, such as the hepatitis B virus surface antigen, human serum albumin, and bovine lysozyme, are identical to the native proteins and thus authentic.
Figure 3.31 P. pastoris integrating expression vector. The gene of interest (GOI) is cloned between the promoter (AOX1p) and transcription termination-polyadenylation sequence (AOX1t) of the P. pastoris alcohol oxidase 1 gene. The HIS4 gene encodes a functional histidinol dehydrogenase of the histidine biosynthesis pathway. The ampicillin resistance (Ampr) gene and an origin of replication (oriE) function in E. coli. The segment marked 3′ AOX1 is a piece of DNA from the 3′ end of the alcohol oxidase 1 gene of P. pastoris. A double recombination event between the AOX1p and 3′ AOX1 regions of the vector and the homologous segments of chromosome DNA results in the insertion of the DNA carrying the gene of interest and the HIS4 gene.
Figure 3.32 Integration of DNA into a specific P. pastoris chromosome site by single (A) or double (B) recombination. (A) A single recombination (dashed line) between the HIS4 gene of an intact circular plasmid and a chromosome his4 mutant gene results in the integration of the entire vector, including the gene of interest (GOI) with the AOX1 promoter in the 5′ AOX1 DNA segment and the transcription termination–polyadenylation sequence from the AOX1 gene (TT), into the chromosome. The inserted DNA is flanked by recombined mutant his4 and functional HIS4 genes. The dot in the his4 gene represents the mutation. (B) A double recombination (dashed lines) between the cloned 5′ AOX1 and 3′ AOX1 DNA segments of a restriction endonuclease (RE) linearized DNA fragment from the vector and the corresponding chromosome regions results in the integration of the gene of interest (GOI) with the AOX1 promoter in the 5′ AOX1 segment, the transcription termination–polyadenylation sequence from the AOX1 gene (TT), and a functional HIS4 gene. The chromosome AOX1 coding region is lost as a result of the recombination event.
Authentic heterologous proteins for industrial and pharmaceutical uses have also been generated in other yeasts. For example, the α- and β-globin chains of human hemoglobin A were produced from cDNAs in the methylotrophic yeast Hansenula polymorpha. The thermotolerant dimorphic yeasts Arxula adeninivorans and Yarrowia lipolytica have demonstrated promising potential as hosts for high levels of heterologous-protein expression. These yeasts can grow at temperatures up to 48°C and can survive at higher temperatures (55°C) for several hours. At higher temperatures, the fungi grow in a mycelial form and revert to budding cells below 42°C. Some secreted proteins, such as glucoamylase and invertase, are produced at higher levels in mycelia. Cell morphology also influences posttranslational modification, with O-linked glycosylation predominating in budding cells while N-glycosylation occurs in both mycelial and budding cells. An additional advantage of A. adeninivorans is the ability to grow on a wide range of inexpensive carbon and nitrogen sources.
It is often necessary to try several host types in order to find the one that produces the highest levels of a biologically active recombinant protein. Differences in the processing and productivity of a particular protein can occur among different yeast strains. For example, both S. cerevisiae and H. polymorpha produced a truncated version of the human protein interleukin-6 (IL-6), whereas A. adeninivorans produced a full-length version of the protein. The construction of a wide-range yeast vector for expression in several fungal species has facilitated this trial-and-error process (Fig. 3.33). The basic vector contains features for propagation and selection in E. coli and a multiple cloning site for insertion of interchangeable modules that are chosen for a particular yeast host, including a sequence for vector integration into the fungal genome, a suitable origin of replication, a promoter to drive expression of the heterologous gene, and selectable markers to complement a range of nutritional auxotrophies or to confer resistance to antifungal compounds, such as hygromycin B (Table 3.12). In other words, by selecting from a range of available modules, customized vectors can be rapidly and easily constructed for expression of the same gene in several different yeast cells to determine which host is optimal for heterologous-protein production.
Figure 3.33 A wide-range yeast vector system for expression of heterologous genes in several different yeast hosts. The basic vector contains a multiple cloning site (MCS) for insertion of selected modules containing appropriate sequences for chromosomal integration (rDNA module), replication (ARS module), selection (Selection module), and expression (Expression module) of a target gene in a variety of yeast host cells (Table 3.12 shows examples of interchangeable modules). Sequences for maintenance (oriE) and selection (Ampr) of the vector in E. coli are also included.
Table 3.12 Examples of modules available for wide-range yeast vector systems
Expression vectors with appropriate transcription and translation control elements for the expression of recombinant proteins in filamentous fungi are also commercially available. Distinct from unicellular yeasts, filamentous fungi are multicellular, microscopic fungi that produce long, branching strands of cells called hyphae. This group of fungi includes the common mold genera Penicillium, Rhizopus, Trichoderma, and Aspergillus. Many species of these genera of filamentous fungi are a rich natural resource for commercially important metabolites and enzymes and have also been used as cell factories for the production of recombinant proteins for the food, beverage, pulp and paper, and pharmaceutical industries (Table 3.13). Similar to yeast, filamentous fungi can grow rapidly on inexpensive media, secrete large amounts of proteins, process eukaryotic mRNA, and carry out many posttranslational modifications. However, an additional advantage of using filamentous fungi as hosts for the production of mammalian proteins is their ability to add mammalian-like sugars to proteins.
Table 3.13 Some recombinant proteins produced by filamentous fungal expression systems
In sum, fungal expression systems play an important role in the production of heterologous proteins for research, industrial, and medical applications. However, experience has shown that no one system is able to produce an authentic version of every heterologous protein. For this and other reasons, gene expression systems that use insect or mammalian cells have been developed.
Baculovirus–Insect Cell Expression Systems
Baculoviruses are a large, diverse group of double-stranded DNA viruses that specifically infect arthropods, including many insect species, and are not infectious to other animals. During the infection cycle, two forms of baculovirus are produced (Fig. 3.34). Infection is initiated when the occluded form of the virus is ingested by the insect larvae. In this form, the viral nucleocapsids (virions) are clustered in a matrix that is made up of the protein polyhedrin, which protects the virions from degradation in the environment. The occluded virions packaged in this protein matrix are referred to as a polyhedron. Following ingestion, the virus is taken up into the midgut of the insect, the polyhedrin matrix dissolves due to the alkaline gut environment, and the virions enter midgut cells. The virions migrate to the nucleus where they are uncoated, releasing the DNA for genome replication, synthesis of viral proteins, and production of new virions. Within the insect midgut, the infection can spread from cell to cell as viral particles (single nucleocapsids) bud off from an infected cell and infect other midgut cells. This form of the virus, known as the budding form, is not embedded in a polyhedrin matrix and is not infectious to other individual insect hosts, although it can infect cultured insect cells. Plaques produced in insect cell cultures by the budding form of baculovirus have a different morphology from those produced by the occluded form. During the late stages of the infection cycle in the insect host, about 36 to 48 hours after infection, the polyhedrin protein is produced in massive quantities and continues for 4 to 5 days, until the infected cells rupture and the host organism dies. Occluded virions are released and can infect new hosts.
Figure 3.34 Budded (A) and occluded (B) forms of baculovirus. During budding, a nucleocapsid becomes enveloped by the membrane of an infected cell. A polyhedron consists of clusters of nucleocapsids (occluded virions) embedded in various orientations in a polyhedrin matrix.
The promoter for the polyhedrin (polyh) gene is exceptionally strong, and transcription from this promoter can account for as much as 25% of the mRNA produced in cells infected with the virus. Moreover, the polyhedrin protein is not required for virus production. Consequently, it was reasoned that replacement of the polyhedrin gene with a coding sequence for a heterologous protein, followed by infection of cultured insect cells, would result in the production of large amounts of the heterologous protein. Furthermore, because of the similarity of posttranslational modification systems between insects and mammals, it was thought that the recombinant protein would mimic closely, if not precisely, the authentic form of the original protein. Baculoviruses have been highly successful as delivery systems for introducing target genes for production of high levels of heterologous proteins in insect cells. More than a thousand different proteins have been produced using this system, including several vaccines that have been approved for veterinary or human use (Table 3.14).
Table 3.14 Some recombinant products produced by baculovirus-insect cell expression systems
The specific baculovirus that has been used extensively as an expression vector is Autographa californica multiple nuclear polyhedrosis virus (AcMNPV). A. californica (the alfalfa looper) and over 30 other insect species are infected by AcMNPV. This virus also grows well on many insect cell lines. The most commonly used cell line for genetically engineered AcMNPV is derived from the fall armyworm, Spodoptera frugiperda. In these cells, the polyhedrin promoter is exceptionally active, and during infections with wild-type baculovirus, high levels of polyhedrin are synthesized.
Baculovirus Expression Vectors
The first step in the production of a recombinant AcMNPV that will be used to deliver the gene of interest into the insect host cell is to create a transfer vector. The transfer vector is an E. coli-based plasmid that carries a segment of DNA from AcMNPV (Fig. 3.35A) consisting of the polyhedrin promoter region with flanking upstream AcMNPV DNA, a multiple cloning site, the polyhedrin transcription termination and polyadenylation signal regions with flanking downstream AcMNPV DNA. The coding region for the polyhedrin gene has been deleted from this segment of DNA. The flanking AcMNPV DNA sequences provide regions for homologous recombination with the AcMNPV genome. A gene of interest is inserted into the multiple cloning site between the polyhedrin promoter and termination sequences, and the transfer vector is propagated in E. coli.
Figure 3.35 (A) Organization of the expression unit of a baculovirus (AcMNPV) transfer vector. The gene of interest is inserted into the multiple cloning site (MCS) that lies between the polyhedrin gene promoter (Pp) and polyhedrin gene transcription termination (Pt) sequences. The AcMNPV DNA upstream from the polyhedrin promoter (5′ AcMNPV DNA) and downstream from the polyhedrin transcription termination sequence (3′ AcMNPV DNA) provides sequences for integration of the expression unit by homologous recombination into an AcMNPV genome. (B) Replacement of the AcMNPV polyhedrin gene with an expression unit from a transfer vector. A double-crossover event (×) between homologous DNA segments of the transfer vector and the AcMNPV genome results in the integration of the expression unit into the AcMNPV genome. GOI, gene of interest.
Next, insect cells in culture are cotransfected with AcMNPV DNA and the transfer vector carrying the cloned gene. Within some of the doubly transfected cells, a double-crossover recombination event occurs at homologous sequences on the transfer vector and in the AcMNPV genome, and the cloned gene with polyhedrin promoter and termination regions becomes integrated into the AcMNPV DNA (Fig. 3.35B) with the concomitant loss of the polyhedrin gene. Virions lacking the polyhedrin gene produce distinctive zones of cell lysis (occlusion-negative plaques), from which recombinant baculovirus is isolated.
Linearization of the AcMNPV genome before transfection into insect cells substantially increases the frequency of recombinant plaques. The AcMNPV genome was engineered with two Bsu36I sites that were placed on either side of the polyhedrin gene (Fig. 3.36). One is in gene 603 and the other is in gene 1629 that is essential for viral replication. When DNA from this modified baculovirus is treated with Bsu36I and transfected into insect cells, no viral replication occurs because a segment of the essential gene (1629) is missing. As part of this system, a transfer vector is constructed with the gene of interest between intact versions of gene 603 and gene 1629. This transfer vector is introduced into insect cells that were previously transfected with linearized, replication-defective AcMNPV DNA that is missing the segment between the two Bsu36I sites. A double-crossover event both reestablishes a functional version of gene 1629 and incorporates the cloned gene into the AcMNPV genome (Fig. 3.36). With this system, over 90% of the baculovirus plaques are recombinant.
Figure 3.36 Production of recombinant baculovirus. Single Bsu36I sites are engineered into gene 603 and a gene (1629) that is essential for AcMNPV replication. These genes flank the polyhedrin gene in the AcMNPV genome. After a baculovirus with two engineered Bsu36I sites is treated with Bsu36I, the segment between the Bsu36I sites is deleted. Insect cells are cotransfected with Bsu36I-treated baculovirus DNA and a transfer vector with a gene of interest under the control of the promoter (p) and terminator (t) elements of the polyhedrin gene and the complete sequences of both genes 603 and 1629. A double-crossover event (dashed lines) generates a recombinant baculovirus with a functional gene 1629. With this system, almost all of the progeny baculoviruses are recombinant.
To eliminate the need to use plaque assays to identify and purify recombinant viruses, several methods have been developed that introduce the target gene into the baculovirus genome at a specific nucleotide sequence by recombination, either in an intermediate bacterial host, such as E. coli, or in vitro outside of a host cell. Transfection of insect cells is required only for the production of the heterologous protein. AcMNPV DNA can be maintained in E. coli as a plasmid known as a bacmid, which is a baculovirus–plasmid hybrid molecule. In addition to AcMNPV genes, the bacmid contains an origin of replication for propagation in E. coli, a kanamycin resistance gene for selection of the bacmid, and an integration site (attachment site) that is inserted into the lacZ′ gene without impairing its function (Fig. 3.37A and B). Another component of this system is the transfer vector that carries the gene of interest cloned between the polyhedrin promoter and a terminator sequence. In the transfer vector, the target gene expression unit (expression cassette) and a gentamicin resistance gene are flanked by DNA attachment sequences that can bind to the attachment site in the bacmid (Fig. 3.37B). An ampicillin resistance gene lies outside the expression cassette for selection of the transfer vector.
Figure 3.37 Construction of a recombinant bacmid. (A) An E. coli plasmid is incorporated into the AcMNPV genome by a double-crossover event (dashed lines) between DNA segments (5′ and 3′) that flank the polyhedrin gene to create a shuttle vector (bacmid) that replicates in both E. coli and insect cells. The gene for resistance to kanamycin (Kanr), an attachment site (att) that is inserted in frame in the lacZ′ sequence, and an E. coli origin of replication (oriE) are introduced as part of the plasmid DNA. (B) The transposition proteins encoded by genes of the helper plasmid facilitate the integration (transposition) of the DNA segment of the transfer vector that is bounded by two attachment sequences (attR and attL). The gene for resistance to gentamicin (Genr) and a gene of interest (GOI) that is under the control of the promoter (p) and transcription terminator (t) elements of the polyhedrin gene are inserted into the attachment site (att) of the bacmid. The helper plasmid and transfer vector carry the genes for resistance to tetracycline (Tetr) and ampicillin (Ampr), respectively. (C) The recombinant bacmid has a disrupted lacZ′ gene (*). The right-angled arrow denotes the site of initiation of transcription of the cloned gene after transfection of the recombinant bacmid into an insect cell. Cells that are transfected with a recombinant bacmid are not able to produce functional β-galactosidase.
Bacterial cells carrying a bacmid are cotransformed with the transfer vector and a helper plasmid that encodes the specific proteins (transposition proteins) that mediate recombination between the attachment sites on the transfer vector and on the bacmid and that carries a tetracycline resistance gene (Fig. 3.37B). After recombination, the DNA segment that is bounded by the two attachment sites on the transfer vector (the expression cassette carrying the target gene) is transposed into the attachment site on the bacmid, destroying the reading frame of the lacZ′ gene (Fig. 3.37C). Consequently, bacteria with recombinant bacmids produce white colonies in the presence of IPTG and 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-Gal). White colonies that are resistant to kanamycin and gentamicin and sensitive to both ampicillin and tetracycline carry only a recombinant bacmid and no transfer and helper plasmids. After all of these manipulations, the integrity of the cloned gene can be confirmed by PCR. Finally, recombinant bacmid DNA can be transfected into insect cells, where the cloned gene is transcribed and the heterologous protein is produced.
The simultaneous expression of two or more cloned genes can lead to the formation of functional multimeric protein complexes. This can be accomplished by transfecting insect cells with a single recombinant baculovirus expressing multiple proteins. AcMNPV is particularly amenable to carrying large insertions, up to 38 kb, with several foreign genes due to its flexible envelope. In one study, the genes for three different envelope structural proteins from the human severe acute respiratory syndrome coronavirus (SARS-CoV) were expressed simultaneously at a high level from a single baculovirus vector (Fig. 3.38A). The proteins were found to assemble spontaneously and stably into virus-like particles (Fig. 3.38B). Virus-like particles, comprised of the assembled protein coat of a virus but without the nucleic acid genome, are the basis for some subunit vaccines (chapter 7).
Figure 3.38 Production of virus-like particles using a baculovirus-insect cell expression system. (A) Viral spike (S), membrane (M), and envelope (E) proteins, which comprise the envelope of the human SARS-CoV, are expressed in insect cells from a single recombinant baculovirus vector carrying all three viral genes. (B) Following expression, the S, M, and E proteins self-assemble to form a SARS-CoV virus-like particle that resembles the original virus but does not contain the viral genetic material. The virus-like particle is a candidate vaccine for protection against SARS. Pp, polyhedrin promoter; 10p, baculovirus p10 promoter.
Mammalian Glycosylation and Processing of Proteins in Insect Cells
Although insect cells can process proteins in a manner similar to that of other eukaryotes, some mammalian proteins produced in S. frugiperda cell lines are not authentically glycosylated. For example, insect cells do not normally add galactose and terminal sialic acid residues to N-linked glycoproteins. Where these glycans are normally added to mannose residues during the processing of some proteins in mammalian cells, insect cells will trim the oligosaccharide to produce paucimannose (Fig. 3.39). Consequently, the baculovirus system cannot be used for the production of several important mammalian glycoproteins. To ensure the production of “humanized” glycoproteins with accurate glycosylation patterns, insect cell lines have been constructed that express a combination of mammalian glycosyltransferases (Fig. 3.39).
Figure 3.39 N-glycosylation of proteins in the Golgi apparatus of insect, human, and “humanized” insect cells. While the sugar residues added to N-glycoproteins in the endoplasmic reticulum are similar in insect and human cells, further processing in the Golgi apparatus yields a trimmed oligosaccharide (paucimannose) in insect cells and an oligosaccharide that terminates in sialic acid in human cells. To produce recombinant proteins for use as human therapeutic agents, “humanized” insect cells have been engineered to express several enzymes that process human glycoproteins accurately. Blue squares, N-acetylglucosamine; red circles, mannose; green squares, galactose; orange squares, sialic acid.
Further improvements to prevent undesirable processing of heterologous proteins in insect cells are the removal of the genes encoding chitinase and the protease v-cathepsin from the AcMNPV genome. v-Cathepsin is normally produced late in the infection cycle to facilitate the release of new virions from the insect host. It also reduces the yield of heterologous proteins through proteolytic cleavage. Chitinase is produced in conjunction with v-cathepsin and is thought to function in the proper folding of v-cathepsin and in the degradation of the host exoskeleton. It is secreted at very high levels from baculovirus-infected insect host cells and can compete with secreted target proteins for the secretory apparatus, thereby reducing yields of the target protein. Coexpression of chaperones to ensure proper folding of the target protein has also resulted in increased yields of functional heterologous proteins.
Mammalian Cell Expression Systems
Currently, about half of the commercially available therapeutic proteins are produced in mammalian cells. Chinese hamster ovary (CHO) cells are most commonly used because they produce proteins with human-like glycans and have been adapted for growth in high-density suspension cultures in serum-free medium, which not only reduces costs but also facilitates purification of the target protein and reduces the risk of contamination with animal-derived material. They are receptive to transfection and can achieve long-term (stable) gene expression and high yields of heterologous proteins. Other host cell lines are derived from mouse myelomas, baby hamster kidney (BHK), and human embryo kidney (HEK 293). Although mammalian cells have been used for some time to produce therapeutic proteins, especially antibodies, and vectors carrying suitable expression signals have been developed, current efforts are aimed at improving productivity through the development of high-producing cell lines, increasing the stability of production over time, and increasing expression by manipulating the chromosomal environment in which the recombinant genes are integrated.
Vector Design
Most cloning vectors constructed for the expression of heterologous genes in mammalian cells are based on the genomes of viruses that infect mammalian cells. Many vectors are derived from a simian virus (simian virus 40 [SV40]) that can replicate in several mammalian species. The genome of this virus is a double-stranded DNA molecule of 5.2 kb that carries genes expressed early in the infection cycle that function in the replication of viral DNA (early genes) and genes expressed later in the infection cycle that function in the production of viral capsid proteins (late genes). For use as a cloning vector, some of the early and late genes are removed and replaced with a target gene under the control of appropriate mammalian expression signals. Although many cloning vectors are based on SV40 DNA, its use is restricted to small inserts because only a limited amount of DNA can be packaged into the viral capsid. Other vectors that can accommodate larger amounts of cloned DNA are derived from adenovirus; bovine papillomavirus, which can be maintained as a multicopy plasmid in some mammalian cells; and adeno-associated virus, which can integrate into specific sites in the host chromosome. Baculovirus delivery systems have also been developed to express target proteins in mammalian cells. Although baculoviruses cannot replicate in mammalian cells, they can be transduced into these cells with a transduction efficiency reaching 100% in some cases, where they enter the nucleus and transiently express heterologous genes that were inserted in the viral genome. In addition, stable artificial chromosome expression systems have been developed for some mammalian cell lines. These carry specific sequences for integration of one or more copies of a target gene by recombination.
All mammalian expression vectors tend to have similar features and are not very different in design from other eukaryotic expression vectors. A typical mammalian expression vector (Fig. 3.40) contains a eukaryotic origin of replication, usually from an animal virus, such as SV40. The promoter sequences that drive expression of the cloned gene(s) and the selectable marker gene(s), and the transcription termination sequences (including polyadenylation signals), must be eukaryotic and are frequently taken from either human viruses or mammalian genes. Strong constitutive promoters and efficient polyadenylation signals are preferred. Inducible promoters are often used when continuous synthesis of the heterologous protein is toxic to the host cell. Expression of a gene of interest may be increased by placing the sequence for an intron between the promoter and the multiple cloning site, within the transcribed region. The sequences that are required for selection and propagation of a mammalian expression vector in E. coli are derived from a standard E. coli cloning vector.
Figure 3.40 Generalized mammalian expression vector. The multiple cloning site (MCS) and selectable marker gene (SMG) are under the control of eukaryotic promoter (p), polyadenylation (pa), and termination of transcription (TT) sequences. An intron (I) enhances the production of heterologous protein. Propagation of the vector in E. coli and mammalian cells depends on the origins of replication oriE and orieuk, respectively. The ampicillin resistance (Ampr) gene is used for selecting transformed E. coli.
For the best results, a gene of interest must be equipped with translation control sequences (Fig. 3.41). Initiation of translation in higher eukaryotic organisms depends on a specific sequence of nucleotides surrounding the start (AUG) codon in the mRNA called the Kozak sequence (e.g., GCCGCC(A or G)CCAUGG) in vertebrates. The corresponding DNA sequence for the Kozak sequence is placed at the 5′ end of the gene of interest, often followed by a signal sequence to facilitate secretion, a protein sequence (tag) to enhance the purification of the heterologous protein, and a proteolytic cleavage sequence that enables the tag to be removed from the heterologous protein. A stop codon is required for translation to cease at the correct location. Finally, the sequence content of the 5′ and 3′ untranslated regions (UTRs) is important for efficient translation and mRNA stability. Either synthetic 5′ and 3′ UTRs or those from the human β-globin gene are used in mammalian expression vectors. The codon content of the gene of interest may also require modification to suit the translational preferences of the host cell.
Figure 3.41 Translation control elements. A gene of interest can be fitted with various sequences that enhance translation and facilitate both secretion and purification, such as a Kozak sequence (K), signal sequence (S), protein affinity tag (T), proteolytic cleavage site (P), and stop codon (SC). The 5′ and 3′ UTRs increase the efficiency of translation and contribute to mRNA stability.
The majority of mammalian cell expression vectors carry a single gene of interest that encodes a functional polypeptide. However, the active form of some commercially important proteins consists of two different protein chains. For example, human thyroid-stimulating hormone is a two-chain protein (heterodimer), and both hemoglobin and antibodies are tetramers with two copies of each subunit, α2β2 and H2L2, respectively. It is possible to clone the gene or cDNA for each subunit of a multimeric protein, synthesize and purify each subunit separately, and then mix the subunits together in a test tube. Unfortunately, relatively few multisubunit proteins are properly assembled in vitro. By contrast, in vivo assembly of dimeric and tetrameric proteins is quite efficient. Consequently, various strategies have been devised for the production of two different recombinant proteins within the same cell.
To produce hetero-dimeric or -tetrameric proteins, single vectors that carry two cloned genes have been developed. The two genes are placed under the control of independent promoters and polyadenylation signals (double-cassette vectors) (Fig. 3.42A). Alternatively, to ensure that equal amounts of the recombinant proteins are synthesized, “bicistronic” vectors have been constructed with the two cloned genes separated from each other by a DNA sequence that contains an internal ribosomal entry site (IRES) (Fig. 3.42B). IRESs are found in mammalian virus genomes, and after transcription, they allow simultaneous translation of different proteins from a polycistronic mRNA molecule. Transcription of a “gene α–IRES–gene β” construct is controlled by one promoter and polyadenylation signal. Under these conditions, a single “two-gene” (bicistronic) transcript is synthesized, and translation proceeds from the 5′ end of the mRNA to produce one of the subunits (subunit α) and internally from the IRES element to produce the second subunit (subunit β) (Fig. 3.42B).
Figure 3.42 Expression vectors for production of hetero-dimeric or -tetrameric proteins in mammalian cells. (A) Two-gene expression vector. The cloned genes (gene α and gene β) encode subunits of a protein dimer (αβ). The cloned genes are inserted into a vector and are under the control of different eukaryotic promoter (p), polyadenylation (pa), and termination of transcription (TT) sequences. Each subunit is translated from a separate mRNA, and a functional protein dimer (αβ) is assembled. (B) Bicistronic expression vector. Each cloned gene (gene α and gene β) is inserted into a vector on either side of a sequence for an IRES. The two genes and the IRES sequence form a transcription unit under the control of a single eukaryotic promoter. Translation of the mRNA occurs from the 5′ end and internally (right-angled arrows). Both subunits (α and β) are synthesized and assembled into a functional protein dimer (αβ). Both expression vectors carry origins of replication for E. coli (oriE) and mammalian cells (orieuk), a selectable marker (Ampr) for selecting transformed E. coli, and a selectable marker gene (SMG) that is under the control of eukaryotic promoter (p), polyadenylation (pa), and termination of transcription (TT) sequences.
Selectable Markers for Mammalian Expression Vectors
For the most part, the systems that are used to select transfected mammalian cells are the same as those for other eukaryotic host cells (Table 3.15). A number of bacterial marker genes have been adapted for eukaryotic cells. For example, the bacterial neo gene, which encodes neomycin phosphotransferase, is often used to select transfected mammalian cells. In eukaryotic cells, G-418 (Geneticin), which is phosphorylated and thereby inactivated by neomycin phosphotransferase, replaces neomycin as the selective agent because neomycin is not an effective inhibitor of eukaryotic protein synthesis.
Table 3.15 Selective marker gene systems for mammalian cells
Some selection schemes are designed not only to identify transfected cells, but also to increase heterologous-protein production by amplifying the copy number of the expression vector. The dihydrofolate reductase–methotrexate system falls into this category. Dihydrofolate reductase catalyzes the reduction of dihydrofolate to tetrahydrofolate, which is required for the production of purines. The enzyme is inhibited by methotrexate. Sensitivity to methotrexate can be overcome if the cell produces excess dihydrofolate reductase, and as the methotrexate concentration is increased over time, the dihydrofolate reductase gene in cultured cells is amplified. It is not unusual for methotrexate-resistant cells to have hundreds of dihydrofolate reductase genes. The standard dihydrofolate reductase–methotrexate protocol entails transfecting dihydrofolate reductase-deficient cells with an expression vector carrying a dihydrofolate reductase gene as the selectable marker gene and treating the cells with methotrexate. After the initial selection of transfected cells, the concentration of methotrexate is gradually increased, and eventually cells with very high copy numbers of the expression vector are selected.
Engineering Mammalian Cell Hosts for Enhanced Productivity
In large-scale bioreactors, depleted nutrients and accumulation of toxic cell waste can limit the viability and density of cells as they respond to stress by inducing cell death, also known as apoptosis. One method to improve cell growth and viability under culture conditions in bioreactors is to prevent the tumor suppressor protein p53, which is a transcription factor, from activating the cell death response pathway. The mouse double mutant 2 protein (MDM2) binds to protein p53 and prevents it from acting as a transcription factor (Fig. 3.43). MDM2 also marks p53 for degradation. CHO cells were transfected with plasmids containing a regulatable MDM2 gene and cultured under conditions that mimicked the late stages of cell culture and in nutrient-limited medium. Cultures expressing MDM2 had higher cell densities and delayed cell death compared to nontransfected cells, especially in nutrient-deprived medium.
Figure 3.43 Strategy to increase yields of recombinant mammalian cells. Cell death (apoptosis), stimulated by the transcription factor p53, can lead to decreased yields of recombinant mammalian cells grown under stressful conditions in large bioreactors. To prevent cell death, the gene encoding MDM2 is introduced into mammalian cells. The MDM2 protein binds to p53 and prevents it from inducing expression of proteins required for apoptosis. Engineered cells not only showed delayed cell death, but also achieved higher cell densities in bioreactors.
Many cultured mammalian cells are unable to achieve high cell densities in cultures because toxic metabolic products accumulate in the culture medium and inhibit cell growth. Although efforts are made to optimize the culture conditions, inevitably nutrients essential for optimal cell growth, including oxygen, are reduced. Under low-oxygen conditions, many cells, including CHO cells, secrete the acidic waste product lactate as they struggle to obtain energy from glucose. Under these conditions, pyruvate, an intermediate compound produced during the metabolism of glucose, is converted to lactate by lactate dehydrogenase rather than entering into the tricarboxylic acid cycle, where it is further oxidized through the activity of pyruvate carboxylase (Fig. 3.44). To counteract the acidification of the medium from lactate secretion, the human pyruvate carboxylase gene was cloned into an expression vector under the control of the cytomegalovirus (CMV) promoter and the SV40 polyadenylation signals and transfected into CHO cells. When the pyruvate carboxylase gene was stably integrated into the CHO genome and expressed, the enzyme was detected in the mitochondria, where glucose is degraded. After 7 days in culture, the rate of lactate production decreased by up to 40% in the engineered cells.
Figure 3.44 When oxygen is present, pyruvate, which is formed from glucose during glycolysis, is converted by the enzyme pyruvate carboxylase to an intermediate compound in the tricarboxylic acid (TCA) cycle. This metabolic pathway is important for the generation of cellular energy and for the synthesis of biomolecules required for cell proliferation. However, under low-oxygen conditions, such as those found in large bioreactors, pyruvate carboxylase has a low level of activity. Under these conditions, lactate dehydrogenase converts pyruvate into lactate, which yields a lower level of energy. Cultured cells secrete lactate, thereby acidifying the medium.
Many of the eukaryotic DNA viruses from which the vectors used in mammalian cells are derived maintain their genomes as multicopy episomal DNA (plasmids) in the host cell nucleus. These viruses produce proteins, such as the large-T antigen in SV40 and the nuclear antigen 1 protein in Epstein-Barr virus, that help to maintain the plasmids in the host nucleus and to ensure that each host cell produced after cell division receives a copy of the plasmid. To increase the copy number of the target gene by increasing the plasmid copy number, cell lines have been engineered to express the SV40 large-T antigen or Epstein-Barr nuclear antigen 1.
Many proteins of therapeutic value, such as antibodies and interferon, are secreted. However, the high levels of these proteins that are desirable from a commercial standpoint can quickly overwhelm the capacity of the cell secretory system. Thus, protein processing is a major limiting step in the achievement of high target protein yields. Although high levels of recombinant protein production have been found to increase the levels of proteins associated with proper protein folding and secretion in the endoplasmic reticulum, the levels are usually not sufficient for optimal protein processing. Researchers have therefore devised methods to increase the capacity for protein secretion by engineering cell lines with enhanced production of components of the secretion apparatus. In this regard, an effective strategy may be to simultaneously overexpress several, if not all, of the proteins that make up the secretory mechanism. This can be achieved through the enhanced production of the transcription factor X box protein 1 (Xbp-1), a key regulator of the secretory pathway. Normally, full-length, unspliced xbp-1 mRNA is found in nonstressed cells and is not translated into a stable, functional protein (Fig. 3.45A). However, when unfolded or misfolded proteins accumulate in the endoplasmic reticulum, a ribonuclease is activated that specifically cleaves xbp-1 mRNA (Fig. 3.45B). This results in the production of a functional transcription factor that activates the expression of a number of proteins of the secretion apparatus. A truncated xbp-1 gene that encodes an actively translated form of xbp-1 mRNA (Fig. 3.45C) was overexpressed under the control of the CMV promoter in recombinant CHO cell lines that were previously constructed to express human erythropoietin, human γ-interferon, and human monoclonal antibodies either stably or transiently. Expression of the genes encoding proteins of the secretion apparatus that are controlled by Xbp-1 was found to increase in response to elevated levels of Xbp-1. Although overexpression of Xbp-1 did not increase the production of recombinant proteins in stable cell lines in which the target gene is inserted in a chromosome, a significant increase was observed in cell lines engineered to express the target proteins transiently from a plasmid-encoded gene.
Figure 3.45 Strategy to increase yields of secreted recombinant proteins from mammalian cells by simultaneously upregulating the expression of several proteins in the secretion apparatus. The expression of chaperones and other proteins of the secretion apparatus is controlled by the transcription factor Xbp-1. (A) In unstressed cells, the intron (green box) is not cleaved from the xbp-1 transcript, and therefore, functional Xbp-1 transcription factor is not produced. (B) However, in stressed cells that have accumulated misfolded proteins, an endoribonuclease cleaves the transcript to yield mature xbp-1 mRNA (the red and blue boxes represent exons) that is translated into a stable, functional transcription factor. (C) Recombinant CHO cells were transfected with a truncated gene including only the xbp-1 exons and overproduced a functional Xbp-1 transcription factor that directed the production of high levels of proteins required for protein secretion.
Chromosomal Integration and Environment
A major consideration for high levels and long-term stability of heterologous-protein production is the site of integration of the gene of interest into the mammalian cell genome. Expression of high levels of protein from plasmid vectors is transient and inevitably results in loss of the vector, which cannot be propagated in mammalian cells, or death of the host cell. Stable cell lines in which the target gene is integrated into a chromosome have been generated to overcome this problem. However, the site of integration can have a significant impact on the levels of target protein produced. Genomic DNA is associated with a great number of proteins, including the major histone proteins, around which the DNA is coiled, that compact (condense) the DNA so that it can fit inside the nucleus. The DNA and associated packaging proteins are known as chromatin. While much of the genome is highly condensed (heterochromatin) and contains silent genes or genes with low levels of expression, other regions are less condensed (euchromatin) and contain actively transcribed genes. For enhanced expression and stability, the target gene should be integrated into euchromatin, rather than heterochromatin. Because a larger portion of the genome is in the heterochromatin form, there is a greater chance that the target gene will be inserted into one of these regions.
Techniques to relax chromatin structure and thereby increase the expression of introduced genes include modifying host strains to express proteins that alter chromatin structure at the site of vector integration or inserting DNA elements that prevent chromosome condensation together with the target gene. One approach to alter the epigenetic environment surrounding the inserted gene is to increase histone acetylation. The extent of histone acetylation is determined by the relative activities of two host cell enzymes, histone acetyltransferase, which adds acetyl groups to lysines on histone proteins, and histone deacetylase, which removes acetyl groups from the histone. The relative influences of these two enzymes at a given promoter are determined by specific transcription factors that recruit one or the other of the enzymes to the promoter. Increased histone acetylation, which leads to increased gene transcription, can be accomplished either by increasing the expression of histone acetyltransferase or by decreasing the activity of histone deacetylase. One effective strategy to do this is to target histone acetyltransferase specifically to the site of target gene insertion to ensure that the target gene is actively and continuously transcribed. One group of researchers created a stable CHO cell line in which histone acetyltransferase was produced as a fusion protein with the LexA protein that binds to specific DNA sequences (Fig. 3.46). To test this fusion protein, the green fluorescent protein (GFP) reporter gene was employed as a target gene and was integrated into a CHO chromosome under the control of the CMV promoter with the LexA-binding sequence inserted upstream. A gene encoding resistance to the antibiotic Zeocin was coupled to the reporter gene by an IRES element and therefore was also under the control of the CMV promoter. Stable cells with an active CMV promoter were established by addition of Zeocin to the culture medium. Production of GFP, determined by measuring the emission of green fluorescence, was severalfold higher in cells that expressed the LexA–histone acetyltransferase fusion protein than in those that expressed the LexA protein alone (Fig. 3.46A). The LexA protein specifically binds to the LexA recognition site upstream of the gene encoding GFP and brings with it the fused histone acetyltransferase protein that acetylates histones associated with the promoter region and promotes a higher level of GFP transcription. Moreover, expression remained stable, although at a lower level, for at least 4 months in some of the clones.
Figure 3.46 Strategies to increase expression of recombinant proteins in mammalian cells by altering chromatin structure. Local “relaxation” of chromosome condensation, which leads to increased transcription of genes in the region, can be achieved by the addition of an acetyl group to DNA-packing proteins known as histones. Histone acetylation is catalyzed by the enzyme histone acetyltransferase (HAT). (A) To increase the expression of a recombinant protein, HAT was directed to the site of target gene (GFP gene) insertion in a mammalian chromosome. HAT was expressed as a fusion protein with the LexA protein that binds to a specific DNA sequence (LexA-BS) inserted upstream of the CMV promoter (PCMV) that directs expression of GFP. Production of the HAT-LexA fusion protein under the control of the SV40 promoter (PSV40) increased expression of GFP 6-fold compared to production of the LexA protein alone. (B) Insertion of STAR elements on both sides of the expression cassette further increased GFP expression. The gene encoding resistance to the antibiotic Zeocin was included as a selectable marker and was expressed from an IRES. The arrows above the promoter boxes indicate the direction of transcription.
To improve expression levels over a longer period, the construct was further modified to include a DNA segment known as a stabilizing and antirepressor (STAR) element on both sides of the expression cassette to block repression (Fig. 3.46B). Repression can occur when heterochromatin forms due to the association of the heterochromatin protein HP1 with methylated histones. This stimulates further histone deacetylation and methylation and, consequently, greater HP1 activity. Insertion of the relatively small (<2-kb) STAR elements was found to counteract the activity of HP1 and other heterochromatin-associated repressor proteins. Flanking the expression cassette with the antirepressor elements resulted in higher levels of GFP expression that were maintained over a longer period of time.
Other DNA elements that improve heterologous-protein expression by modifying heterochromatin structure are the ubiquitous chromatin-opening elements and matrix-associated regions. Ubiquitous chromatin-opening elements are sequences of DNA normally found near the promoters of housekeeping genes that are constitutively expressed at high levels due to enhanced histone acetylation. Inclusion of the ubiquitous chromatin-opening element from the promoter of the highly expressed CHO elongation factor 1 alpha gene in an expression vector increased recombinant protein expression in CHO cells 6- to 35-fold. Matrix-associated regions were also found to enhance the production of heterologous protein in CHO cells. These elements, found in the chromosomes of many eukaryotes, bind to protein complexes in the nucleus that arrange regions of the chromosome into loops. It is thought that these DNA loops contain transcriptionally active genes that are regulated in a coordinated fashion. Although matrix-associated regions from the human β-globin gene and the chicken lysozyme gene were found to increase expression of a target gene, not all matrix-associated regions have a positive effect on gene expression.
In sum, mammalian cell expression systems are as versatile and effective as other eukaryotic expression systems. However, industrial production of a recombinant protein with engineered mammalian cells is costly. Consequently, less expensive expression systems are favored unless authenticity of an important recombinant protein can be obtained only with mammalian cells.
Protein Engineering
The physical and chemical properties of natural proteins are sometimes not well suited to a medical, industrial, or other application. In some instances, a protein that is better suited to a particular task may be obtained by using a gene from an organism that grows in an unusual, often extreme, environment. In addition to isolating natural genes that encode proteins with useful properties, directed or random mutagenesis and selection schemes can be used to create a mutant form of a gene that encodes a protein with the desired properties. In directed mutagenesis, a specific amino acid is targeted for change by substituting nucleotides in the corresponding coding sequence in the gene. In random mutagenesis, the amino acid changes that will result in the desired properties are unknown. A large number of mutant proteins, each with a different amino acid change, are generated by randomly altering individual nucleotides within a structural gene, and then tested for the desired properties.
By using directed or random mutagenesis, proteins with enhanced characteristics can be created for therapeutic or industrial applications. For example, the catalytic efficiency, allosteric regulation, cofactor requirement, or substrate specificity of an enzyme may be improved. The latter may decrease undesirable side effects of a therapeutic enzyme. The thermal tolerance and/or pH stability of a protein may be increased, enabling the mutant protein to be used under conditions that would inactivate the native version. The sensitivity of a protein to cellular proteases may be decreased, which would increase the recoverable yield of the protein.
Directed Mutagenesis
In some cases, it is possible to predict in advance which individual amino acids or short sequences of amino acids contribute to a particular physical, kinetic, or chemical property. The process for generating specific amino acid changes by changing the coding sequence at a targeted site in a gene is called site-directed mutagenesis. It is important to keep in mind that a particular property of a protein may be the consequence of two or more amino acids that are far apart from each other in the linear sequence but are juxtaposed as a result of the folding of the protein. In this case, two or more amino acids may have to be changed to produce a protein with the desired properties. Predicting which amino acids of a protein should be changed to attain a specific property generally requires that the three-dimensional structure of the protein, or a similar protein, has been well characterized by X-ray crystallographic analysis. However, for many proteins, such detailed information is often lacking, so site-directed mutagenesis becomes a trial-and-error strategy in which changes are made to those nucleotides that are most likely to yield a particular change in a protein property. Then, of course, the protein encoded by each mutated gene has to be tested to ascertain whether the mutagenesis process has indeed generated the desired change.
Site-Directed Mutagenesis by Overlap Extension PCR
A straightforward site-directed mutagenesis protocol introduces defined nucleotide substitutions into a gene by overlap extension PCR. Two pairs of oligonucleotide primers are used in the PCR; one set of flanking primers anneals to the ends of the target gene (often a cloned gene), and the other set consists of overlapping, internal primers that carry the mutation. While the 3′ end of a primer must be perfectly complementary to the annealing site on the template DNA to prime DNA synthesis, mismatched nucleotides at the 5′ end do not affect the reaction. The target gene is initially amplified in two separate reactions, to generate overlapping (“left” and “right”) fragments (Fig. 3.47). In each of the two reactions, one of the PCR primers (an internal primer) carries the mutation and the other is a flanking primer. After PCR amplification, the products are purified and the left and right fragments are combined. Denaturation and reannealing of the mixed fragments produce some DNA molecules that hybridize in the overlapping, complementary, mutated sequences. DNA polymerase is added to extend the strands to form double-stranded DNA molecules. These molecules are amplified by PCR with the flanking primers to enrich for full-length DNA molecules. The amplified DNA is then cloned into a suitable plasmid vector; this is facilitated by inclusion of suitable restriction enzyme sites in the 5′ ends of the flanking primers. This procedure results in the production of an altered gene that has mutated sites in the region of the overlap of the internal oligonucleotides.
Figure 3.47 Site-directed mutagenesis by overlap extension PCR. The left and right portions of the target DNA are amplified separately by PCR. The primers are shown by horizontal arrows. Primers that carry the mutation are depicted as a line with a spike; a spike denotes a position that contains a nucleotide that is not found in the native gene. The amplified fragments are purified, denatured to make them single stranded, and then reannealed. Regions of overlap are formed between complementary mutation-producing sequences. The single-stranded regions are made double-stranded with DNA polymerase, and then the entire fragment is amplified by PCR. The resultant product is digested with restriction endonucleases A and B and then cloned into a vector that has been digested with the same enzymes.
Site-Directed Mutagenesis by Inverse PCR
Nucleotide substitutions, deletions, or insertions can be introduced into a target gene that has been cloned into a plasmid in a procedure known as inverse PCR. In this case, the entire plasmid is amplified, which restricts the size of the plasmid to less than about 10 kb. The oligonucleotide primers used in the inverse PCR anneal to adjacent sequences in the target gene but are divergently oriented, that is, their 3′ ends are directed away from each other. For point mutations, nucleotide changes are introduced in the middle of one of the primers (Fig. 3.48). To create deletion mutations, primers must flank the region of target DNA to be deleted and be perfectly matched to their annealing sites (Fig. 3.48). To create mutations with long insertions, a stretch of mismatched nucleotides is added to the 5′ end of one or both primers, while for mutations with short insertions, a stretch of nucleotides is added in the middle of one of the primers (Fig. 3.48). PCR amplification yields linear double-stranded DNA products that are circularized by ligation with T4 DNA ligase. Ligation requires that the 5′ ends of the linear DNA molecules are phosphorylated and therefore either the primers must be phosphorylated or the PCR products must be phosphorylated using the enzyme polynucleotide kinase. Finally, the recircularized plasmid DNA is transformed into E. coli by any standard procedure. Since this protocol yields a very high frequency of plasmids with the desired mutation, screening three or four clones by sequencing the target DNA is usually sufficient to find the desired mutation. Given its simplicity and effectiveness, this procedure has come to be widely used to introduce a specified point mutation, insertion, or deletion into a cloned gene.
Figure 3.48 Overview of the basic methodology to introduce point mutations, insertions, or deletions into DNA cloned into a plasmid. The forward and reverse primers are shown in red and green, respectively. The solid circles represent template DNA. The dotted lines represent newly synthesized DNA. The X indicates an altered nucleotide(s).
Mutant Proteins with Unusual Amino Acids
Essentially any protein can be altered by substituting one amino acid for another using site-directed mutagenesis. However, this approach is limited to the 20 amino acids that are normally used in protein synthesis. One way to increase the diversity of the proteins formed after mutagenesis is to introduce synthetic amino acids with unique side chains at specific sites. To do this, E. coli was engineered to produce both a novel transfer RNA (tRNA) that is not recognized by any of the existing E. coli aminoacyl-tRNA synthetases but nevertheless functions in translation and a new aminoacyl-tRNA synthetase that aminoacylates only that novel tRNA. A novel tRNA and unique aminoacyl-tRNA synthetase pair from the archaebacterium Methanococcus jannaschii was used as a starting point for this system. The tyrosine-tRNA synthetase from M. jannaschii can add an amino acid to an amber suppressor tRNA that is a mutant form of tyrosine-tRNA. An amber suppressor tRNA is a modified tRNA that can insert an amino acid into a protein in places where the mRNA contains an amber codon, UAG, which normally is a stop codon that directs the cessation of protein synthesis. To prevent the translational fusion of proteins whose mRNAs normally contain a UAG stop codon with downstream proteins, in vivo suppression is always less than 100% and is often dependent upon the nucleotides surrounding the stop codon. The amino acid specificity of the tyrosine-tRNA synthetase from M. jannaschii is altered by random mutagenesis of its gene so that, instead of tyrosine, it catalyzes the addition of O-methyl-L-tyrosine onto the tRNA. A cloned version of the target gene is modified by site-directed mutagenesis so that it contains a 5′-TAG-3′ in that portion of the DNA that encodes the amino acid that is targeted for change to O-methyl-l-tyrosine (Fig. 3.49). Once the modified DNA has been created, it is used to transform an E. coli strain that was previously engineered to produce the O-methyl-L-tyrosine-tRNA. The engineered E. coli strain inserts O-methyl-L-tyrosine into proteins that contain a UAG stop codon, resulting in a full-length target protein containing the modified amino acid. Had the mutant gene been expressed in wild-type E. coli, a truncated version of the protein would have been produced. This system may be manipulated to insert a variety of different amino acid analogues into specified sites within proteins in an effort to produce functional proteins with altered activities compared with the native form. In a similar approach to this problem, researchers modified a portion of the valine-tRNA synthetase gene so that the altered enzyme was able to add the nonstandard amino acid aminobutyrate to a specific tRNA for subsequent incorporation into proteins. While the full potential of these approaches has yet to be realized, it is nevertheless clear that it is now possible to produce proteins containing unusual chemical structures and possibly having unique properties.
Figure 3.49 Production of a protein with a modified (nonstandard amino acid) side chain. The start codon is highlighted in green, and the stop codons are in red. The inserted amino acid analogue is shown in blue.
Random Mutagenesis
In those cases where the amino acid changes that will result in the desired properties are unknown, a library of mutated sequences is generated by randomly altering individual nucleotides within a structural gene. Most of the mutations will decrease the functioning of the encoded protein, and therefore an efficient screening process is required to identify proteins with the rare mutations that result in beneficial changes.
Error-Prone PCR
Some of the temperature-stable DNA polymerases that are used to amplify target DNA by PCR occasionally insert incorrect nucleotides during DNA replication. If one is attempting to amplify a DNA with high fidelity, this is obviously a problem. On the other hand, if the construction of a library of mutants of the target gene is the objective, then this approach is a useful method for random mutagenesis. Error-prone PCR is performed using DNA polymerases that lack proofreading activity, such as Taq DNA polymerase. The error rate may be increased by increasing the concentration of Mg2+ to stabilize noncomplementary base pairs. Addition of Mn2+, and/or unequal amounts of the four deoxynucleoside triphosphates to the reaction buffer may also increase the error rate. The primer annealing sites on the template DNA define the region to be altered, and the number of nucleotide substitutions per template increases with the number of PCR cycles and the length of the template. Following error-prone PCR, the randomly mutagenized DNA is cloned into an expression vector and screened for altered or improved protein activity. The DNA from those clones that encode the desired activity is isolated and sequenced to determine the relevant changes to the target DNA.
Random Insertion/Deletion Mutagenesis
While error-prone PCR is quite commonly used to introduce random changes into a target gene, it is somewhat limited in the types of changes that can be introduced. Since errors are typically introduced into DNA at no more than one or two per 1,000 nucleotides, only single nucleotides are replaced within a triplet codon, yielding a limited number of amino acid changes from each mutated DNA molecule. As an alternative to error-prone PCR, researchers have developed the technique of random insertion/deletion mutagenesis. With this approach, it is possible to delete a small number of nucleotides at random positions along the gene and, at the same time, insert either specific or random sequences into that position. This method entails the following steps (Fig. 3.50).
1 An isolated gene fragment with different restriction endonuclease sites at each end is ligated at one end to a short nonphosphorylated adaptor that leaves a small gap in one strand of the DNA. The gap is a consequence of the fact that the 5′ nucleotide on the adaptor is not phosphorylated and therefore cannot be ligated to an adjacent 3′-OH group on the gene fragment.
2 After restriction enzyme digestion that creates compatible sticky ends, the gene fragment is recircularized with T4 DNA ligase to create a circular double-stranded gene fragment with a gap in one of the strands.
3 The gapped strand is degraded by digestion with the enzyme T4 DNA polymerase, which has exonuclease activity.
4 Each single-stranded DNA molecule is randomly cleaved at a single positions by treating it with a cerium(IV)–ethylenediaminetetraacetic acid (EDTA) complex.
5 The linear single-stranded DNA molecules are ligated at each end with adaptors that contain annealing sites for PCR primers, one of which contains several additional nucleotides selected for insertion. The entire mutagenesis library is amplified by PCR.
6 The adaptors are removed by restriction enzyme digestion and the constructs are made blunt ended by filling in the single-stranded overhangs using the Klenow fragment of E. coli DNA polymerase I before the DNA molecules are recircularized by T4 DNA ligase.
7 The products are digested with appropriate restriction enzymes that flank the protein coding sequences and the mutated sequences are cloned into a plasmid vector to test for activity.
Figure 3.50 A random insertion protocol to introduce random mutations into a gene of interest. The inserted DNA is shown in yellow. Adapted from Murakami et al., Nat. Biotechnol. 20:76–81, 2002.
With this approach, it is possible to insert any small DNA fragment (carried on an adaptor) into the randomly cleaved single-stranded DNA, with the result that genes with a much greater number of modified nucleotides may be generated than by error-prone PCR. The mutations that are developed by this procedure may be used to select protein variants with a wide range of activities.
Random Mutagenesis with Degenerate Oligonucleotide Primers
In addition to introducing a specific nucleotide substitution into a gene, overlap extension PCR can be used to incorporate any of the four nucleotides at defined positions to generate all the possible amino acid changes in a particular region of a protein. This pattern of sequence degeneracy is achieved by programming an automated DNA synthesis reaction to add a low level (usually a few percent) of each of the three alternative nucleotides each time a particular nucleotide is added during the synthesis of an oligonucleotide primer (Fig. 3.51). In this way, the oligonucleotide primer preparation contains a heterogeneous set of DNA sequences that will generate a series of mutations that are clustered in a defined portion of the target gene. The degenerate oligonucleotides are employed as “internal” PCR primers to amplify the left and right portions of the target gene in separate reactions (Fig. 3.47). Mixing, denaturing, and annealing the left and right fragments produces some DNA molecules that overlap by complementarity and can be extended by DNA polymerase to produce a library of altered genes that have mutated sites in the region of the overlap of the degenerate oligonucleotides.
Figure 3.51 Chemical synthesis of oligonucleotide primers with any of the four nucleotides at defined positions. In this case, the flask with G phosphoramidite consists of a mixture of nucleotides, such as 94% G, 2% A, 2% C, and 2% T, leading to a mixture of oligonucleotides that may have A, C, or T at the sites where G is the specified nucleotide.
Mutagenesis using degenerate oligonucleotides confers two advantages over targeted mutagenesis: (1) Detailed information regarding the roles of particular amino acids in the functioning of the protein is not required; (2) Unexpected mutants encoding proteins with a range of interesting and useful properties may be generated because the introduced changes are not limited to one amino acid. Of course, should none of the mutants yield a protein with the properties that are being sought, then it may be necessary to repeat the entire procedure with a set of degenerate primers that is complementary to a different region of the gene.
DNA Shuffling
Some biologically important proteins, such as α-interferon (IFN-α), are encoded by a family of several related genes, with each protein having slightly different biological activity. If all, or at least several, of the genes or cDNAs for a particular protein have been isolated, it is possible to recombine portions of these genes or cDNAs to produce hybrid or chimeric forms (Fig. 3.52). This “DNA shuffling” is done with the expectation that some of the hybrid proteins will have unique properties or activities that were not encoded in any of the original sequences. Also, some of the hybrid proteins may combine important attributes of two or more of the original proteins (e.g., high activity and thermostability).
Figure 3.52 Amino acid changes may be introduced into a protein by either random mutagenesis or error-prone PCR, both of which cause single-amino-acid substitutions, and by DNA shuffling, in which genes are formed with large regions from different sources.
The simplest way to shuffle portions of similar genes is through the use of common restriction enzyme sites (Fig. 3.53). Digestion of two or more of the DNAs that encode the native forms of similar proteins with one or more restriction enzymes that cut the DNAs in the same place, followed by ligation of the mixture of DNA fragments, can potentially generate a large number of hybrids. For example, two DNAs, each of which has three unique restriction enzyme sites, can be recombined (shuffled) to produce 14 different hybrids in addition to the original DNA (Fig. 3.53).
Figure 3.53 The 14 different hybrid genes that can be generated by combining restriction enzyme fragments from two genes from the same gene family that have three different restriction sites in common. RE, restriction enzyme.
Another way to shuffle DNA involves combining several members of a gene family, fragmenting the mixed DNA with deoxyribonuclease I (DNase I), selecting smaller DNA fragments, and amplifying these fragments by PCR (Figure 3.54). During PCR, gene fragments from different members of a gene family cross-prime each other after DNA fragments bind to one another by complementary base pairing in regions of high homology. The final full-length products are amplified by PCR using terminal primers. After 20 to 30 PCR cycles, a panel of hybrid (full-length) DNAs will be established (Fig. 3.54). The hybrid DNAs are then cloned to create a library that can be screened for the desired activity. Although DNA shuffling works well with gene families—it is sometimes called molecular breeding—or with genes from different families that nevertheless have a high degree of homology, the technique is not especially useful when proteins have little or no homology. Thus, the DNAs must be very similar to one another or the PCR will not proceed. To remedy this situation and combine the genes of dissimilar proteins, several variations of the DNA-shuffling protocol have been described.
Figure 3.54 Some of the hybrid DNAs that can be generated during PCR amplification of three members of a gene family.
One procedure that was developed to combine the genes of dissimilar proteins and that does not rely on PCR amplification of DNA fragments is called nonhomologous random recombination. In this procedure (Fig. 3.55), DNAs from different sources (either defined or random DNA sequences, or a mixture of both) are combined and then partially digested with DNase I. These DNA fragments, which include a wide variety of sizes, are made blunt ended by digestion with the enzyme T4 DNA polymerase. This enzyme both fills in 5′ overhanging nucleotides and degrades 3′ overhanging nucleotides. The DNA fragments are then mixed with a synthetic DNA fragment that forms a hairpin loop and contains a specific restriction enzyme site. The entire mixture is ligated by the addition of the enzyme T4 DNA ligase that results in the formation of extended mosaic DNA molecules of variable lengths with a hairpin at each end. Ligation of the hairpins prevents further addition of fragments (concatemerization) to the molecules. The average length of these hairpin structures is dictated by the ratio between the blunt-ended DNA and the DNA hairpins added to the ligation reaction. Finally, restriction enzyme digestion removes the hairpin loops so that the resulting sticky-ended DNA fragments can be inserted into plasmid vectors and tested for various activities. Because this process randomly recombines DNA fragments, only a very small fraction of the recombined DNAs are likely to encode the desired activity.
Figure 3.55 Nonhomologous random recombination. Different DNAs (shown in different colors) are mixed together, partially digested with DNase I, blunted at the ends by digestion with T4 DNA polymerase, size fractionated, ligated with synthetic hairpin DNAs to form extended hairpins, restriction enzyme digested to remove the hairpin ends and generate sticky ends, and then ligated into plasmid vectors.
Examples of Protein Engineering
Increasing Protein Stability
Proteins have evolved to perform a particular function for a microorganism, animal, or plant under natural conditions and are often not well suited for a highly specialized biotechnology application. For example, most enzymes are easily denatured by the high temperature and the presence of organic solvents that are used in some industrial processes. Although thermotolerant enzymes can be isolated from thermophilic microorganisms, these organisms often lack the particular enzyme that is required for an industrial processes. Directed mutagenesis can be used to create a protein that will not readily unfold under the conditions in which it will be employed. The addition of disulfide bonds, through the introduction of specifically placed cysteines, can usually significantly increase the stability of a protein (Fig. 3.8). Extra disulfide bonds may perturb the normal functioning of a protein and therefore the activity, as well as the stability, of a modified protein must be tested.
In one example, a receptor antagonist protein was engineered for increased stability to enhance its effectiveness as a therapeutic agent. Low-density lipoprotein receptor-related protein 1 (LRP1) is a cell surface signaling protein that binds lipoproteins and other ligands and removes them from the bloodstream. The receptors also remove blood coagulation proteins, which leads to bleeding episodes in individuals with hemophilia. Following synthesis in the endoplasmic reticulum, LRP1 is escorted to the Golgi by a chaperone protein, receptor-associated protein (RAP). RAP is denatured in the acidic environment of the Golgi, thereby releasing the receptor proteins for subsequent processing and transport to the cell membrane. Exploiting its high affinity for LRP1, RAP can also be administered exogenously to inhibit binding of blood coagulation proteins to the receptors and thereby prevent bleeding episodes in hemophiliacs.
Following administration of RAP, the LRP1-RAP complex that forms at the cell membrane is taken up by the cell in an endosome which results in low pH-induced denaturation of RAP and recycling of LRP1 back to the cell surface (Fig. 3.56A). Thus, the acid sensitivity of RAP limits its potential as a therapeutic agent. Researchers reasoned that introduction of a disulfide bond would increase the acid stability of RAP and thereby prevent its dissociation from LRP1. The structure of RAP is known and computer modeling was used to predict optimal sites for introduction of cysteines. Using site-directed mutagenesis, the coding sequences for two amino acids in domain D3 of RAP, a tyrosine at position 260 and a threonine at position 297, were altered to encode cysteines (Fig. 3.56B). In addition, four histidines in domain D3 were changed to phenylalanine to prevent histidine protonation at low pH that causes RAP to unfold and dissociate from LRP1 (Fig. 3.56C). The two sets of mutations, that is, the introduction of the cysteines and the elimination of the histidines, were introduced into RAP separately and in combination. At pH 7.4, the wild-type and mutant RAP proteins were properly folded (measured as the portion of α-helical content of the protein). At pH 5.5, wild-type RAP unfolded and its affinity for LRP1 was reduced 181-fold (Table 3.16). In contrast, at low pH, mutant RAP with the disulfide bond and mutant RAP with histidines replaced with phenylalanine were substantially folded. Moreover, RAP with the combined mutations remained properly folded. Importantly, the LRP1 binding affinity of the mutant RAP proteins at pH 5.5 was reduced only 22- to 53-fold compared to that at pH 7.4 (Table 3.16). Treatment of LRP1-expressing human fibroblast cells with combined mutant RAP significantly inhibited ligand (α2-macroglobulin) uptake compared to treatment with wild-type RAP, indicating that recycling of the receptor was prevented to a greater extent by the acid-stable, mutant RAP. A similar result was found in mice. This study shows that engineering RAP for increased stability generates a more potent inhibitor of LRP1 than wild-type RAP and is a suitable treatment for diseases mediated by LRP1 clearance of blood proteins.
Figure 3.56 Site-directed mutagenesis of receptor associated protein (RAP) to increase acid stability. (A) Following binding of exogenous RAP to low-density lipoprotein receptor-related protein 1 (LRP1) in the cell membrane, the protein complex is taken into the cell by endocytosis. Acid-sensitive wild-type RAP is denatured in the acidic endosome and releases LRP1, which is recycled back to the cell membrane. (B) To increase the acid-stability of RAP, a disulfide bond was introduced into domain D3. Tyrosine at position 260 (Y260) and threonine at position 297 (T297) were changed to cysteines (C260 and C297) by site-directed mutagenesis. (C) To further increase acid-stability, four histidines (H257, H259, H268, H290) that are protonated at low pH were changed to phenylalanine (F257, F259, F268, F290). Adapted from Prasad et al., 2015. J. Biol. Chem. 290:17262.
Table 3.16 Binding affinity of LRP1 for mutant RAP
Modifying Protein Specificity
The enzyme tissue plasminogen activator (tPA) is a multidomain serine protease that is medically useful for the dissolution of blood clots. However, tPA is rapidly cleared from the circulation, so that it must be administered by infusion. Therefore, to be effective with this form of delivery, high initial concentrations of tPA must be used. Unfortunately, under these conditions, tPA can cause nonspecific internal bleeding. Thus, a long-lived tPA that has an increased specificity for fibrin in blood clots and is not prone to induce nonspecific bleeding would be desirable. It was found that these three properties could be separately introduced by site-directed mutagenesis into the gene for the native form of tPA. First, changing threonine at position 103 (Thr-103) to asparagine (Asn) causes the enzyme to persist in rabbit plasma approximately 10 times longer than the native form (Table 3.17). Second, changing the amino acids lysine-histidine-arginine-arginine (Lys-His-Arg-Arg) at 296 to 299 to alanines (Ala-Ala-Ala-Ala) produces an enzyme that is much more specific for fibrin than is the native form. Third, changing Asn-117 to glutamine (Gln) causes the enzyme to retain the level of fibrinolytic activity found in the native form. Moreover, combining these three mutations in a single construct allows all three activities to be expressed simultaneously (Table 3.17).
Table 3.17 Stabilities and activities of various modified versions of tPA
Using random mutagenesis, it is possible to generate antibodies in vitro that are directed against a wide range of antigens. The portion of an antibody molecule that contains the ability to bind to an antigen is sometimes called a Fab fragment, and within this fragment are hypervariable complementarity-determining regions (CDRs) separated by relatively invariant framework regions (Fig. 3.57). Together, the six CDRs, three from the variable part of the light chain and three from the variable part of the heavy chain (see chapter 4), determine the specificity of an antibody molecule. Altering one or more of the amino acids in one of the CDRs changes the specificity of the antibody.
Figure 3.57 Structure of a Fab molecule. FR, framework region; CDR, complementarity-determining region. CH1 and CL are constant domains from the heavy and light chains of the antibody molecule, respectively. The N-terminal (NH2) and C-terminal (COOH) ends of each polypeptide, as well as a disulfide bridge (-S-S-), are indicated.
Using degenerate oligonucleotide primers, it was possible to introduce a range of different mutations into the three CDRs of the variable region of an antibody heavy-chain gene (Fig. 3.58). First, one of the CDRs was modified by PCR. Then, in a second PCR, the other two CDRs were modified. Finally, the three altered CDRs were combined in a single DNA fragment. The same changes can also be introduced into the gene for the variable portion of an antibody light chain. Using this approach, a Fab fragment of a monoclonal antibody that was specific for the compound 11-deoxycortisol was altered to produce a Fab fragment that was specific for cortisol and no longer bound 11-deoxycortisol. In theory, Fab fragments directed against any antigen can be generated with this method.
Figure 3.58 Random mutagenesis used to introduce mutations into the three CDR genes of the variable region of a heavy antibody chain. The framework region sequences are shown in green, and the CDR sequences are in blue. (A) The first PCR with a degenerate forward primer (top arrow) introduces random mutations into the DNA encoding CDR1. (B) The second PCR with degenerate primers introduces random mutations into the DNA encoding CDR2 and CDR3. (C) The third PCR combines the DNA that was amplified in panels A and B. The circled portion of the DNA indicates the place where random mutations were introduced.
Modifying Cofactor Requirements
Subtilisins are a group of nonspecific serine proteases that are secreted into growth medium by Gram positive bacteria and are widely used as biodegradable cleaning agents in laundry detergents. All subtilisins bind tightly (affinity constant [Ka] = ∼107 M) to one or more molecules of calcium per molecule of enzyme where calcium binding stabilizes the enzyme. Unfortunately, since subtilisins are used in industrial settings where there are a large number of chelating agents that can bind to and effectively remove calcium, these enzymes are rapidly inactivated under these conditions. To circumvent this problem, it is necessary first to abolish completely the ability of a subtilisin to bind calcium and then to attempt to increase the stability of this modified enzyme in the absence of bound calcium.
The starting point for the development of a modified subtilisin was an isolated subtilisin gene from Bacillus amyloliquefaciens. Prior to this work, the subtilisin protein had been well characterized, and its high-resolution X-ray crystallographic structure had been determined. Oligonucleotide-directed mutagenesis was used to construct a mutant form of the gene for this enzyme by deleting the nucleotides encoding the portion of the protein—amino acids 75 to 83—that is responsible for binding to calcium (Fig. 3.59). The protein without this stretch of amino acids does not bind calcium and, surprisingly, retains an overall conformation that is similar to that of the native form.
Figure 3.59 Genetic engineering of calcium-independent subtilisin. The native calcium-containing enzyme is highly active but loses almost all of its activity when the loop that binds the calcium is deleted. After several rounds of random mutagenesis, mutants of the deleted enzyme, each with stabilizing mutations and a low level of activity, are selected. Several of these mutations are combined into a single derivative with the result that a subtilisin that does not require calcium and that has a high level of activity is produced.
The next steps in the development of a stable subtilisin from one that lacked a calcium-binding domain entailed predicting which sites might contribute to stability and which amino acids should be placed at these sites. The researchers assumed that any of the amino acids that had previously interacted with the calcium-binding loop in the native form of the enzyme were potential candidates for change. In total, 10 amino acids were considered to be candidates for modification. Moreover, since it was not known a priori which particular amino acids might best contribute to stabilizing the enzyme molecule, random mutagenesis was used for each of these sites.
The amino acids selected for mutagenesis came from four separate regions of the protein: the N terminus (amino acids 2 to 5), the omega loop (amino acids 36 to 44), an α-helical region (amino acids 63 to 85), and a β-pleated region (amino acids 202 to 220). To identify the best amino acid at a particular position, Bacillus subtilis clones expressing the mutated proteins were grown in the wells of microtiter plates, heated to 65°C for 1 hour, allowed to cool, and then assayed for subtilisin activity. It was necessary to express the active calcium-free subtilisin in the host B. subtilis because it was lethal when expressed in E. coli.
After the initial screening, stabilizing mutations were identified at 7 of the 10 positions that were examined (Table 3.18). When these stabilizing mutations were combined into a single gene, the enzyme that was produced had kinetic properties that were very similar to those of the native form of subtilisin. Moreover, the modified form of subtilisin was nearly 10 times more stable than the native form of the enzyme in the absence of calcium and, surprisingly, about 50% more stable than the native enzyme in the presence of calcium. This work demonstrates that complex properties of enzymes that involve a large number of different amino acids can be genetically engineered.
Table 3.18 Effects of random mutations of selected amino acid residues on the stability of a subtilisin lacking a calcium-binding domain
Decreasing Protease Sensitivity
Streptokinase, a 47-kilodalton (kDa) protein produced by pathogenic strains of Streptococcus bacteria, is a blood clot–dissolving agent. Streptokinase forms a complex with plasminogen that results in the conversion of plasminogen to plasmin, the active protease that degrades fibrin in the blood clot. Unfortunately, plasmin also rapidly degrades streptokinase, making it necessary for medical personnel to administer streptokinase as a 30- to 90-minute infusion so that a sufficient level of intact and active streptokinase is maintained. Since it is essential that individuals suffering a heart attack be treated as quickly as possible, a long-lived streptokinase could be administered as a single injection before a person is transported to a hospital. This early treatment might contribute to saving the lives of heart attack victims by quickly restoring blood flow and limiting damage to heart muscles.
Plasmin is a trypsin-like protease that specifically cleaves the peptide bond after a lysine or arginine. Plasmin rapidly digests the 414-amino-acid streptokinase protein by cleaving it at lysine 59, near the N terminus, and at lysine 386, near the C terminus (Figure 3.60A). The 328-amino-acid peptide that remains following the digestion by plasmin has approximately 16% of the activity of intact streptokinase in activating plasminogen, and it is slowly degraded by plasmin until no activity remains. To make streptokinase less susceptible to proteolysis by plasmin, the lysines at positions 59 and 386 were changed to glutamine by site-directed mutagenesis (Fig. 3.60B-D). Glutamine was chosen to replace lysine because the length of its side chain is similar to that of lysine, so that the three-dimensional structure would not be disturbed, and because glutamine does not have a positive charge. Both single mutants, as well as the double mutant, had the same ability to bind to and activate plasminogen as did the native form of streptokinase. At the same time, in the presence of plasmin, the half-lives of all three mutants were increased compared with native streptokinase, with the double mutant being approximately 21-fold more protease resistant. This work is an important step in the development of variants of streptokinase with significantly longer half-lives.
Figure 3.60 Protease (plasmin) sensitivity of streptokinase and some engineered plasmin-resistant derivatives. The green circles indicate positively charged lysines where plasmin cuts the polypeptide. The red circles indicate glutamines where plasmin does not cut the polypeptide. The horizontal arrows indicate plasmin digestion of streptokinase. The protein size and activity following plasmin digestion are indicated for each derivative. (A) Native protein; (B) the derivative in which glutamine replaces lysine 386; (C) the derivative in which glutamine replaces lysine 59; (D) the derivative in which glutamines replace lysine 59 and lysine 386.
summary
The production of a protein requires that the gene be properly transcribed and then that the mRNA be translated. In prokaryotes, a promoter sequence is necessary for the initiation of transcription at the correct nucleotide site, and a terminator sequence at the end of the gene is essential for the cessation of transcription. The aim of many biotechnology applications is to produce large amounts of protein, so it is necessary to use a strong promoter that supports transcription at a high level such as the promoter from gene 10 of the E. coli bacteriophage T7. However, continuous transcription of a cloned gene drains the energy reserves of the host cell; therefore, it is also necessary to use a promoter system whose activity can be regulated, such as the E. coli lac promoter that is induced by addition of lactose or IPTG. For translation, a ribosome-binding site is placed in the DNA region that precedes the translation initiation site (start codon), and a termination sequence (stop codon) is included at the end of the protein coding sequence to ensure that translation stops at the correct amino acid. Codon optimization may be required for production of foreign proteins in some host cells. If secretion of the protein is desired, the DNA sequence preceding the cloned gene should include a signal sequence in the same reading frame as the target gene. In addition, amino acid purification tags are added to purify the recombinant protein by, for example, immunoaffinity chromatography. In these cases, the junction point of a fusion protein is usually designed to be cleaved in vitro either chemically or enzymatically.
High levels of expression of some foreign proteins in bacterial hosts can result in misfolded proteins that form insoluble inclusion bodies. This can be avoided by growing the recombinant bacterial strains at low temperatures, coexpressing chaperone proteins, overexpressing enzymes that catalyze the formation of disulfide bonds, and/or expressing the target protein as a fusion protein. Recombinant proteins may also be degraded by proteolytic enzymes synthesized by the host cell. To overcome this problem, a cloned gene is altered to encode one or more additional amino acids at its N terminus or to remove protease recognition sequences. In this form, the recombinant protein is no longer rapidly degraded. During large-scale production of recombinant proteins, plasmids may be unstable and lost from the population. To overcome this problem, researchers have developed protocols for integrating a cloned gene into a chromosomal site of the host organism. Under these conditions, the gene is maintained stably as part of the DNA of the host organism.
Although many heterologous proteins have been successfully synthesized in prokaryotic host cells, some proteins require eukaryote-specific posttranslational modifications, such as glycosylation, to be functional. Consequently, expression systems were devised for fungal, insect, and mammalian cells. With respect to the ease and likelihood of obtaining an authentic protein from a cloned gene, each of these systems has distinct merits and shortcomings. In other words, there is no single eukaryotic host cell that is capable of producing an authentic protein from every cloned gene.
All eukaryotic expression vectors have the same basic format. The gene of interest, which may be equipped with sequences that facilitate the secretion and purification of the heterologous protein, is under the control of eukaryotic promoter, polyadenylation and transcription terminator sequences. To simplify both maintenance and recombinant DNA manipulations, eukaryotic expression vectors are routinely maintained in E. coli.
Several different fungal-based expression systems have been developed for the production of heterologous proteins. The yeast S. cerevisiae, which is well characterized genetically and can be grown in large fermenters, has been used extensively for this purpose. Both episomal and integrating expression vectors, as well as artificial chromosomes, have been constructed. However, with S. cerevisiae as the host cell, a number of recombinant proteins are hyperglycosylated, and in some cases, protein yields are low because the capacity of the cell to properly fold and secrete proteins has been exceeded. Other yeast and filamentous fungal systems have been developed for the production of heterologous proteins. Of these, the methylotrophic yeast P. pastoris has been used successfully because of the low occurrence of hyperglycosylation, the ease of obtaining high cell densities, and the rapid and strong response of the AOX1 promoter (usually used to drive the gene of interest) to methanol. A “humanized” strain of P. pastoris has been genetically altered to produce glycoproteins with glycosylation patterns that are identical to those found on the same proteins produced in human cells.
A large number of biologically active heterologous proteins have also been produced in insect cells grown in culture using baculoviruses to deliver the gene of interest into the insect host cell. This system is advantageous because posttranslational protein modification is similar in insects and mammals, and the baculoviruses used in these systems do not infect humans or other insect cells. The baculovirus most commonly used as a vector is AcMNPV. A gene of interest is inserted into the AcMNPV genome by homologous or site-specific recombination between sequences on a transfer vector carrying the target gene and the AcMNPV DNA. Recombination occurs either in insect cells doubly transfected with the transfer vector and viral DNA, in E. coli as an intermediate host, or in an in vitro reaction catalyzed by purified integration enzymes. The last two methods eliminate the need to identify and purify recombinant baculoviruses using plaque assays. Once the target gene has been inserted, recombinant AcMNPV DNA is introduced into insect cells for heterologous-protein production. Improved insect host cells have been developed through genetic engineering to increase protein yields and to ensure that target proteins are properly glycosylated. In addition to production of a single protein of interest, the baculovirus–insect expression system is particularly amenable to producing functional multimeric protein complexes, such as virus-like particles, which are effective vaccines.
Many therapeutic proteins that require a full complement of posttranslational modifications are now produced in cultured mammalian cells, such as CHO cells. Most of the vectors that have been developed to introduce foreign genes into mammalian cells are based on mammalian viruses, especially SV40. The viral genome has been altered to remove some viral genes required for replication and viral-protein production and to include suitable mammalian transcription and translation signals to drive expression of the cloned gene. Expression of chromosomally integrated target genes can be increased by altering the epigenetic state of the insertion site through histone acetylation or insertion of chromatin-relaxing DNA elements. A major challenge for production of high levels of heterologous proteins in mammalian cell lines is preventing cell death, which is often induced by the stressful conditions of large-scale bioreactors. Strategies to improve cell growth and protein yields include genetically engineering host cells to block the transcription factor that induces apoptosis, to prevent accumulation of toxic metabolites in the culture medium, and to increase expression of proteins required for proper protein folding and secretion.
Natural proteins are often not well suited for biotechnology applications. For example, an enzyme may unfold, and therefore be inactivated, at high temperatures employed in an industrial process, or a therapeutic protein may be short lived due to protease sensitivity necessitating administration of high, somewhat toxic, doses. Random or directed mutagenesis can be employed to alter the nucleotide sequence encoding a protein to improve its stability, activity, specificity, cofactor requirements, or protease resistance. Straightforward protocols have been developed to introduce nucleotide substitutions into a gene on an oligonucleotide primer by PCR. When the specific amino acids that contribute to a property are known in advance, the defined nucleotide changes can be introduced on an oligonucleotide by overlap extension or inverse PCR. When the amino acid changes that will result in the desired property of a protein are unknown, libraries of randomly mutated sequences can be generated by performing a PCR under conditions that increase the error rate or by employing degenerate oligonucleotide primers. Most of the mutations will decrease the function of the encoded protein and therefore the libraries must be screened to identify proteins with desired characteristics. Shuffling of DNA segments from two or more genes creates a large number of hybrid proteins that can also be screened for unique biological activity.
REFERENCES
Barnes LM, Dickson AJ. 2006. Mammalian cell factories for efficient and stable protein expression. Curr. Opin. Biotechnol. 17:381–386.
Berger I, Fitzgerald DJ, Richmond TJ. 2004. Baculovirus expression system for heterologous multiprotein complexes. Nat. Biotechnol. 22:1583–1587.
Çelik E, Çalik P. 2012. Production of recombinant proteins by yeast cells. Biotechnol. Adv. 30:1108–1118.
Chatterjee R, Yuan L. 2006. Directed evolution of metabolic pathways. Trends Biotechnol. 24:28–38.
Chen R. 2012. Bacterial expression systems for recombinant protein production: E. coli and beyond. Biotechnol. Adv. 30:1102–1107.
Chong SR, Mersha FB, Comb DG, Scott ME, Landry D, Vence LM, Perler FB, Benner J, Kucera RB, Hirvonen CA, et al. 1997. Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element. Gene. 192:271–281.
Condreay JP, Kost TA. 2007. Baculovirus vectors for insect and mammalian cells. Curr. Drug Targets. 8:1126–1131.
de Boer HA, Comstock LJ, Vasser M. 1983. The tac promoter: a functional hybrid derived from the trp and lac promoters. Proc. Natl. Acad. Sci. USA. 80:21–25.
Eijsink VGH, Bjørk A, Gåseidnes S, Sirevåg R, Synstad B, van den Burg B, Vriend G. 2004. Rational engineering of enzyme stability. J. Biotechnol. 113:105–120.
Elleuche S, Pöggeler S. 2010. Inteins, valuable genetic elements in molecular biology and biotechnology. Appl. Microbiol. Biotechnol. 87:479–489.
Ernst JF. 1988. Codon usage and gene expression. Trends Biotechnol. 6:196–199.
Ferrer M, Chernikova TN, KTimmis KN, Golyshin PN. 2004. Expression of a temperature-sensitive esterase in a novel chaperone-based Escherichia coli strain. Appl. Environ. Microbiol. 70:4499–4504.
Fong BA, Wu W-Y, Wood DW. 2010. The potential role of self-cleaving purification tags in commercial-scale processes. Trends Biotechnol. 28:271–279.
Gasser B, Saloheimo M, Rinas U, Dragosits M, Rodríguez-Carmona E, Baumann K, Giuliani M, Parrilli e, Branduardi P, Lang C, et al. 2008. Protein folding and conformational stress in microbial cells producing recombinant proteins: a host comparative overview. Microb. Cell Fact. 7:11–29.
Geisow MJ. 1991. Both bane and blessing—inclusion bodies. Trends Biotechnol. 9:368–369.
Gellissen G, Kunze G, Gaillardin C, Cregg JM, Berardi E, Veenhuis M, van der Klei E. 2005. New yeast expression platforms based on methylotrophic Hansenula polymorpha and Pichia pastoris and on dimorphic Arxula adeninivorans and Yarrowia lipolytica—a comparison. FEMS Yeast Res. 5:1079–1096.
Glick BR. 1995. Metabolic load and heterologous gene expression. Biotechnol. Adv. 13:247–261.
Hamilton SR, Gerngross TU. 2007. Glycosylation engineering in yeast: the advent of fully humanized yeast. Curr. Opin. Biotechnol. 18:387–392.
Heckman KL, Pease LR. 2007. Gene splicing and mutagenesis by PCR-driven overlap extension. Nat. Protoc. 2:924–932.
Kaur J, Sharma R. 2006. Directed evolution: an approach to engineer enzymes. Crit. Rev. Biotechnol. 26:165–199.
Keyt BA, Paoni NF, Refino CJ, Berleau L, Nguyen H, Chow A, Lai J, Peña L, Pater C, Ogez J, et al. 1994. A faster-acting and more potent form of tissue plasminogen activator. Proc. Natl. Acad. Sci. USA. 91:3670–3674.
Kjeldsen T, Hach M, Balschmidt P, Havelund S, Pettersson AF, Markussen J. 1998. Prepro-leaders lacking N-linked glycosylation for secretory expression in the yeast Saccharomyces cerevisiae. Protein Expr. Purif. 14:309–316.
Kurokawa Y, Yanagi H, Yura T. 2000. Overexpression of protein disulfide isomerase DsbS stabilizes multiple-disulfide-bonded recombinant protein produced and transported to the periplasm in Escherichia coli. Appl. Environ. Microbiol. 66:3960–3965.
Kwaks THJ, Otte AP. 2006. Employing epigenetics to augment the expression of therapeutic proteins in mammalian cells. Trends Biotechnol. 24:137–142.
Kwaks THJ, Sewalt RGAB, van Blokland R, Siersma TJ, Kasiem M, Kelder A, Otte AP. 2005. Targeting of a histone acetyltransferase domain to a promoter enhances protein expression levels in mammalian cells. J. Biotechnol. 115:35–46.
Liu X, Constantinescu SN, Sun Y, Bogan JS, Hirsch D, Weinberg RA, Lodish HF. 2000. Generation of mammalian cells stably expressing multiple genes at predetermined levels. Anal. Biochem. 280:20–28.
Lucas BK, Giere LM, DeMarco RA, Shen A, Chisholm V, Crowley CW. 1996. High-level production of recombinant proteins in CHO cells using a dicistronic DHFR intron expression vector. Nucleic Acids Res. 24:1774–1779.
Majander K, Anton L, Antikainen J, Lang H, Brummer M, Korhonen TK, Westerlund-Wikström B. 2005. Extracellular secretion of polypeptides using a modified Escherichia coli flagellar secretion apparatus. Nat. Biotechnol. 23:475–481.
Martinez-Morales F, Borges AC, Martinez A, Shanmugam KT, Ingram LO. 1999. Chromosomal integration of heterologous DNA in Escherichia coli with precise removal of markers and replicons used during construction. J. Bacteriol. 181:7143–7148.
Miyazaki C, Iba Y, Yamada Y, Takahashi H, Sawada J, Kurosawa Y. 1999. Changes in the specificity of antibodies by site-specific mutagenesis followed by random mutagenesis. Protein Eng. 12:407–415.
Murakami H, Hohsaka T, Sisido M. 2002. Random insertion and deletion of arbitrary number of bases for codon-based random mutation of DNAs. Nat. Biotechnol. 20:76–81.
Ness JE, Welch M, Giver L, Bueno M, JCherry JR, Borchert TV, Stemmer WPC, Minshull J. 1999. DNA shuffling of subgenomic sequences of subtilisin. Nat. Biotechnol. 17:893–896.
Palmeros B, Wild J, Szybalski W, LeBorgne S, Hernández-Chávez G, Gosset G, Valle F, Bolivar F. 2000. A family of removal cassettes designed to obtain antibiotic-resistance-free genomic modifications of Escherichia coli and other bacteria. Gene. 247:255–264.
Pina AS, Lowe CR, Roque ACA. 2014. Challenges and opportunities in the purification of recombinant tagged proteins. Biotechnol. Adv. 32:366–381.
Prasad JM, Migliorini M, Galisteo R, Strickland DK. 2015. Generation of a potent low density lipoprotein receptor-related protein 1 (LRP1) antagonist by engineering a stable form of the receptor-associated protein (RAP) D3 domain. J. Biol. Chem. 290:17262-
Punt PJ, van Biezen N, Conesa A, Albers A, Mangnus J, van den Hondel C. 2002. Filamentous fungi as cell factories for heterologous protein production. Trends Biotechnol. 20:200–206.
Qiu J, Swartz JR, Georgiou G. 1998. Expression of active human tissue-type plasminogen activator in Escherichia coli. Appl. Environ. Microbiol. 64:4891–4896.
Robinson AS, Hines V, Wittrup KD. 1994. Protein disulfide isomerase overexpression increases secretion of foreign proteins in Saccharomyces cerevisiae. Bio/Technology. 12:381–384.
Rogers S, Wells R, Rechsteiner M. 1986. Amino acid sequences common to rapidly degraded proteins: the PEST hypothesis. Science. 234:364–368.
Rosano GL, Ceccarelli EA. 2014. Recombinant protein expression in Escherichia coli: advances and challenges. Front. Microbiol. 5:1–17.
Schröder M. 2008. Engineering eukaryotic protein factories. Biotechnol. Lett. 30:187–196.
Seo JH, Bailey JE. 1985. Effects of recombinant plasmid content on growth properties and cloned gene product formation in Escherichia coli. Biotechnol. Bioeng. 27:1668–1674.
Shi X, Jarvis DL. 2007. Protein N-glycosylation in the baculovirus-insect cell system. Curr. Drug Targets. 8:1116–1125.
Simmons LC, Yansura DG. 1996. Translational level is a critical factor for the secretion of heterologous proteins in Escherichia coli. Nat. Biotechnol. 14:629–634.
Steinborn G, Gellissen G, Kunze G. 2007. A novel vector element providing multicopy vector integration in Arxula adeninivorans. FEMS Yeast Res. 7:1197–1205.
Strausberg SL, Alexander PA, Gallagher DT, Gilliland GL, Barnett BL, Bryan PN. 1995. Directed evolution of a subtilisin with calcium-independent stability. Bio/Technology. 13: 669–673.
Tobias JW, Schrader TE, Rocap G, Varshavsky A. 1991. The N-end rule in bacteria. Science. 254:1374–1377.
Van Oers MM, Pijlman GP, Vlak JM. 2015. Thirty years of baculovirus-insect cell protein expression: from dark horse to mainstream technology. J. Gen. Virol. 96:6–23.
Wang L, Brock A, Herberich B, Schultz PG. 2001. Expanding the genetic code of Escherichia coli. Science. 292:498–500.
Wilkinson DL, Harrison RG. 1991. Predicting the solubility of recombinant proteins in Escherichia coli. Bio/Technology. 9:443–448.
Wu X-C, Ye R, Duan Y, Wong S-L. 1998. Engineering of plasmin-resistant forms of streptokinase and their production in Bacillus subtilis: streptokinase with longer functional half-life. Appl. Environ. Microbiol. 64:824–829.
Zhang G, Brokx S, Weiner JH. 2006. Extracellular accumulation of recombinant proteins fused to the carrier protein YebF in Escherichia coli. Nat. Biotechnol. 24:100–104.
review questions
1. What DNA sequence elements are required for expression of a cloned gene in a prokaryotic host?
2. What is a strong promoter? Why is a strong promoter not always desirable for expression of a cloned gene?
3. What is a regulatable promoter? How is the E. coli lac promoter used to regulate the expression of a clone gene?
4. The promoter for gene 10 of the E. coli bacteriophage T7 is an example of a strong promoter. How is it used to express a cloned gene?
5. Why is codon optimization often required for production of high levels of a recombinant protein?
6. What are inclusion bodies, and how can their formation be avoided?
7. How can a protein of interest be engineered to be secreted to the medium by E. coli?
8. Discuss some strategies to purify a recombinant protein produced in a prokaryotic host. Consider that a protein may be used as a human therapeutic agent.
9. Why is it sometimes advantageous to integrate a target gene into the chromosomal DNA of a prokaryotic host? How might this be achieved?
10. During the course of integrating a target gene into the chromosomal DNA of the host bacterium, a marker gene may also be inserted into the chromosomal DNA. What strategy could be used to excise only the marker gene?
11. What are the major posttranslational modifications of eukaryotic proteins in the endoplasmic reticulum and Golgi apparatus?
12. Describe the features of a eukaryotic expression vector.
13. What criteria are used to decide if a particular recombinant protein should be produced in a yeast, insect, or mammalian cell system?
14. What are the advantages and disadvantages of the different classes of yeast vectors for producing a biotechnology product?
15. Describe some of the strategies that have been used to increase proper folding and secretion of recombinant proteins from yeast cells.
16. Discuss the salient features of a P. pastoris high-expression integrating vector system. How has P. pastoris been “humanized”?
17. What are baculoviruses?
18. Describe a strategy that can be used to insert a target gene into the baculovirus genome for expression in insect cells.
19. Describe the main features of an extrachromosomal mammalian-cell expression vector.
20. Why are yields of recombinant proteins produced by mammalian cells in large bioreactors generally low? How can yields be improved?
21. What is chromatin, and how does it affect gene expression? Describe some of the strategies that have been developed to increase expression levels of a target gene that is integrated into a chromosome of a eukaryotic host cell.
22. You have produced a recombinant enzyme for an industrial application but have found that the protein is unstable at moderately high temperatures. Describe how you would increase the stability of the enzyme. Assume that you have determined the DNA sequence of the gene encoding the enzyme and the structure of the enzyme.
23. How can unusual amino acids be incorporated into proteins, thereby producing an altered form of the target protein?
24. You have produced a recombinant enzyme for an industrial application but have found that the catalytic activity is low. Describe how you would increase the activity of the enzyme if the DNA sequence of the gene was known but the structure of the enzyme was not determined.
25. Outline two ways in which DNA shuffling may be used to generate hybrid genes.
26. How would you engineer streptokinase so that it was less sensitive to proteolytic digestion?
27. How can the gene(s) encoding a Fab fragment of a monoclonal antibody be modified so that the specificity of the antibody is altered?