Читать книгу Bacterial Pathogenesis - Brenda A. Wilson - Страница 14
Оглавление5
IN THIS CHAPTER
Importance of the Normal Resident Microbial Populations (Microbiota) of the Human Body
Characterization of the Body’s Microbiota
Taking a Microbial Census by Using Microbial rRNA Gene Sequence Analysis
Characterizing Microbiomes by Using Metagenomic Analysis
Overview of the Human Microbiota
Microbiota of the Small Intestine and Colon
Microbiota of the Vaginal Tract
The Other Microbiota: The Forgotten Eukaryotes
Solving Problems in Bacterial Pathogenesis
CHAPTER 5
The Microbiota of the Human Body
MICROBIOMES AND BEYOND
The ancient Greek philosopher Protagoras of Abdera (ca. 490–420 BCE) famously declared that “Of all things, the measure is Man,” but to the microbes that live in or on us we are more like the proverbial “free lunch.” Yet the microbes colonizing our bodies from shortly after our birth to our death do “pay rent” in various ways. They protect us from disease-causing microbes and contribute to our nutrition and healthy immune status. Our bodies are adapted not only to tolerate these resident microbes, but also to encourage their presence. Indeed, getting in touch with our microbial side is an important part of understanding what it is to be human.
Importance of the Normal Resident Microbial Populations (Microbiota) of the Human Body
A healthy human body harbors more than 10 times as many microbial cells as human cells. These microbes, which include bacteria, fungi, and archaea, are collectively known as the microbiota. Since they do not take up as much space as human cells, their bulk is less than that of the human cells, but their various manifestations can be just as noticeable. For example, the odor of sweat comes from bacterial metabolism of compounds in skin secretions. Bacteria form the film that develops on unbrushed teeth and the plaque that we pay dental hygienists to remove. Even the flatus that we embarrassingly emit is an odiferous indication of the activities of bacteria in the intestine.
Although these particular manifestations of the presence of bacteria are viewed by most as unpleasant, eliminating bacteria from our bodies, even if it were possible, would be a very bad idea. These tiny freeloaders play many beneficial roles in our health. Many members of our resident microbiotas provide us nutrients by degrading foodstuffs that remain undigested by our own systems, such as complex carbohydrates, and by synthesizing and excreting essential vitamins, such as vitamin K, vitamin B12, and other B vitamins.
Normal microbiota also protect us from pathogens by competing for nutrients and occupying potential attachment sites. Indeed, the importance of these resident microbiota becomes particularly evident when they are disrupted or eliminated. For example, a common symptom of taking antibiotics is the development of mild diarrhea as the composition of the microbiota of the colon is disturbed. In more extreme cases, toxin-producing bacteria, such as Clostridium difficile, that are normally present in very low numbers can grow out of control, leading to severe damage of the colon lining and even death. Women who have taken antibiotics that affect the balance of the normal vaginal microbiota often develop vaginitis caused by overgrowth of yeast or vaginosis caused by overgrowth of certain Gram-negative bacteria. Under normal conditions, the vagina is colonized predominantly by Gram-positive Lactobacillus species. When the numbers of these protective Lactobacilli decrease, yeasts and other Gram-negative bacteria can overgrow and produce toxic substances or degradative enzymes that further kill members of the normal microbiota, cause damage to the host mucosa, and ultimately cause disease symptoms, such as foul odor or discharge.
Recent experimental evidence indicates that indigenous bacteria play a crucial inducive role in intestinal development during early postnatal life, particularly in the caecum and Peyer’s patches, as well as in development and maturation of the innate and adaptive immune systems. Colonization by commensal microbiota stimulates the production of cross-reactive antibodies, which are secreted into the lumen of the gut where they prevent infection by related pathogens. Experimental evidence has demonstrated that animals that are born in aseptic conditions and raised in a sterile environment without exposure to microorganisms (germfree animals) have severely underdeveloped immune systems and require substantially more food to maintain the same body weight as others that are raised conventionally.
The notion that maintaining beneficial or “good” bacteria in the intestinal tract is conducive to good health has actually spawned the billion-dollar probiotics industry. Probiotics are preparations of live bacteria, usually freeze-dried into pellets or added to foodstuffs such as yogurt, that are ingested intentionally to bolster the normal population of “good” colonic bacteria or are included in douches to promote a healthy vaginal microbiota. Most of the preparations contain Lactobacillus or Bifidobacterium genera that are common to a particular body site. While the particular strains commonly used in commercial preparations are considered safe, they often do not colonize the host well and so must be taken frequently to maintain the bacteria in the host environment. An alternative approach to maintaining a healthy microbiota that is gaining popularity is prebiotics, which are substances such as fructo-oligosaccharides or nondigestible fiber or “roughage” derived from plants that can alter the composition of the gut microbiota by fostering the growth of “good” bacteria in the large intestine.
Claims for the probiotics currently on the market tend to be vague and very general, but credible scientific evidence is mounting that these preparations may provide some long-lasting health benefits. The idea that probiotics or prebiotics could help patients whose intestinal or vaginal bacterial populations have been disrupted by antibiotics or other factors has attracted new interest for biomedical applications. The hunt for new, targeted therapies that restore optimal balance within microbial communities has already yielded alternative clinical treatment strategies, such as strain supplementation mentioned earlier and prophylactic fecal transplantation to prevent disease from overgrowth of C. difficile. New information on the composition and properties of the normal microbiota of our bodies may further improve the design of effective probiotics.
In a related area, researchers are beginning to exploit our microbiota as a powerful diagnostic tool for assessing health status. For instance, accurate diagnosis of changes in the microbiota might help identify patients who are at higher risk for certain diseases, particularly those diseases whose manifestation correlates with a perturbed microbiota. Ideally, intervention strategies could be initiated sooner and with greater certainty of outcome in these individuals. Understanding the role of our microbiota in health and disease is already having a tremendous impact in the area of precision medicine, an emerging personalized and holistic medicinal approach to disease prevention and treatment that takes into account environmental, lifestyle, life history, and genetic variability of each individual.
Figuring out how to maintain bacterial populations that are conducive to health and then finding ways to capitalize on that knowledge to promote a healthy microbiota is not a trivial undertaking. The microbial populations found in the mouth, intestine, and vagina are complex, not only in terms of the hundreds of different species at each site, but also in terms of the diversity of the metabolic requirements and outputs of those species. Until recently the only way to characterize the microbiota of a particular body site was microscopic visualization and/or laboratory cultivation using various growth media. This approach was not only time-consuming and expensive, but also unreliable because of the poor plating and growth efficiencies of the organisms that are more difficult to cultivate, which often comprise the majority of the bacteria. Additionally, many bacterial species cannot even be cultured in the laboratory. As a result of these challenges, microscopy or culture-based approaches are simply unable to address important questions regarding microbiota makeup, spatial and temporal dynamics, and contribution to health and disease. In order to tackle these unknowns, new technologies and approaches were required.
Characterization of the Body’s Microbiota
The advent of culture-independent, nucleic acid-based methods for characterizing bacterial populations revolutionized how researchers acquired information about the composition and diversity of bacterial communities and revitalized research interest in the mammalian microbiota. A critical advance over the past decade that changed this picture even more was the development of mega-scale, high-throughput, rapid, and cost-effective DNA sequencing technologies and the accompanying analytical and bioinformatics tools needed to manipulate, interpret, integrate, and store these large data sets. Importantly, the dramatic reduction in sequencing and computational costs in the last few years has moved the field forward at an astounding pace. Not only is it possible to quickly sequence genomes from isolated, individual microbes (genomics), but now the entire assemblage of genomes within complex microbial communities (microbiomes) can also be determined with relative ease.
Parallel to these remarkable advances in DNA sequencing and analysis, other momentous advances have occurred: in RNA sequencing and gene expression technologies (transcriptomics) for the assessment of which genes are expressed under particular conditions; large-scale separation, identification, and analytical tools for assessment of protein expression and function (proteomics); multiplex antibody, cytokine, and chemokine immunologic profiling for assessment of the host immune response; and lipid, sugar, and other metabolite profiling (metabolomics) for the assessment of cellular processes of individual microbial cells as well as those of the entire microbial community and the host. Indeed, many researchers are now combining these various technologies into multivariate holistic approaches (multi-omics) that enable overall functional assessment of the microbiota in the context of the host environment.
Before delving into the microbial populations of different parts of the body, it is worth reviewing some of the powerful new culture-independent approaches and analytical tools that are available for exploring the scope, depth, and variety of microbes that comprise mammalian microbiotas.
Taking a Microbial Census by Using Microbial rRNA Gene Sequence Analysis
Some of the first questions that arise when scientists are seeking to characterize a complex microbial population are: what microbial species and how many of each are present, how much variation in composition is there from person to person and site to site, and how does the composition change with conditions and over time? To answer these questions, it is first necessary to identify the species to which the microbes belong and to determine their phylogenetic relationship (or evolutionary similarity) to each other. Although most studies surveying the microbiota to date have focused on the bacterial content of the community, there are some researchers who are beginning to explore other microbes, such as archaea, fungi, protozoa, and viruses (including bacteriophage). For now, we will likewise focus on characterization of the bacterial content of microbiotas, but will return to the other microbes later in the chapter.
16S rRNA Gene-Based Taxonomical Identification of Bacteria. To complete a census of the species present in a bacterial community, researchers must first perform sequence analysis of all, or at least the more abundant, species present in a sample. For this, they need to choose a gene that is common to all bacteria of interest. The most widely used approach to determining the bacterial content of a community is to isolate the total genomic DNA from the microbial population and then employ polymerase chain reaction (PCR) to specifically amplify the bacterial 16S rRNA genes (Figure 5-1), which are then sequenced. The use of 16S rRNA genes is advantageous because these genes are large enough (1,542 nucleotides for the Escherichia coli gene) to contain adequate sequence information for identification and discrimination among close relatives but small enough to be sequenced easily. The 16S rRNA gene, which is present in all bacteria, is a mosaic of regions (Figure 5-1A), some that are highly conserved among all bacterial species and some that are less conserved and consequently contain sequence signatures for different bacterial species, acquired through slow accumulation of mutations over time.
Figure 5-1. Detection of bacteria in a clinical specimen based on 16S rRNA gene amplification by PCR. The 16S rRNA gene is used as the standard for bacterial taxonomic identification and phylogenetic relationship studies because it is highly conserved among different taxa of bacteria. (A) The 16S rRNA gene (∼1,542 nucleotides in the Escherichia coli gene shown here) has highly conserved regions of sequence that can serve as primer binding sites for PCR amplification, but also has hypervariable regions (labeled V1 through V9) that can be used as signatures for distinguishing among different bacterial taxa and establishing phylogenetic relationships. (B) For detection of bacteria in a clinical sample, PCR primers (solid dark bars) recognize conserved segments of DNA on either side of the variable region to be amplified. For the PCR reaction, a thermostable DNA polymerase that exhibits maximal catalytic activity at 75–80°C and possesses 3′ to 5′ exonuclease activity that reduces incorporation of the wrong nucleotide is used, such as the Pfu or Vent polymerase. The amplified segment (amplicon) can then be sequenced and compared to rRNA databases of known bacteria for taxonomic identification.
The basic procedure is relatively simple. DNA primers that recognize highly conserved regions at the beginnings and ends of the bacterial 16S rRNA genes are used to amplify most of the gene, including the variable regions containing the identification signatures, by PCR using a thermostable, high-fidelity DNA polymerase, such as Vent or Pfu (Figure 5-1B). The resulting PCR products, called amplicons, are sequenced directly using new DNA sequencing technologies, described in detail later in the chapter. Using bioinformatics (computer software programs capable of handling and analyzing massive amounts of data), the sequences of the rRNA genes present in the original sample (called output sequence reads) are compared to those available in the ever-growing, publicly available DNA sequence databases (Box 5-1) to identify the nearest bacterial relatives and provide an immediate identification of the taxon (plural taxa; group of one or more populations of related organisms) from which the sequence originated. It is now possible to identify an unknown bacterial isolate within 24 hours by this approach, and with automation, high-capacity supercomputers, and current bioinformatics tools, it can happen even sooner.
Box 5-1.
Data, Data, Data—What To Do with All That Data?
How does one go about storing and sorting through the massive amounts of sequencing data and information that has been generated over the years? Because of the critical need for researchers to have access to the data and be able to readily use it, a number of centralized, publicly available databases have been formed around the world. These databases, most of which are Web-based and freely available online to the public, consist of libraries of life sciences information, DNA sequencing data, protein structure data, gene expression data, and other computational or scientific data from genomics, transcriptomics, proteomics, metabolomics, and phylogenetics. Because of the need for compiling and analyzing these massive amounts of data and information from various sources, an entirely new field of bioinformatics emerged that involves design, development, management, utilization, and maintenance of these life sciences databases. Databases have become an important tool and resource for scientists studying complex biological systems. Whenever a researcher obtains or publishes a nucleotide sequence or other data in a scientific journal, the researcher is required to deposit that sequence and/or information in one of the databases, and that sequence receives an accession number, which is a tracking number that helps the databases maintain and cross-reference the information.
The largest primary sequence databases, which form part of the International Nucleotide Sequence Database (INSD), consist of: GenBank (National Center for Biotechnology Information (NCBI)), the United States’ centralized library of various biological data, including nucleotide sequences; EMBL ENA (European Molecular Biology Laboratory European Nucleotide Archive), Europe’s library of nucleotide sequence data; DDBJ (DNA Data Bank of Japan), Japan’s nucleotide database; UniProtKB (Universal Protein Resource Knowledgebase), a database that provides protein translations of nucleotide sequences from the nucleotide sequence databases; Swiss-Prot (Swiss Institute of Bioinformatics), a protein sequence database; and RCSB PDB (Research Collaboratory for Structural Bioinformatics Protein Data Bank), a protein structure model database.
There are public genome databases that collect libraries of genome sequences and provide annotation (assigning identification and possibly function to the genes), curation (literature citations supporting the annotation), and analysis tools to aid researchers in comparative genomics studies. For example, JGI Genomes (Department of Energy Joint Genome Institute) is a database for many eukaryotic and microbial genomes, the NMPDR (National Microbial Pathogen Data Resource) is a curated database of annotated genomic data for a number of bacterial pathogens, and the SEED is a database developed to annotate and curate 1,000 genomes using a subsystems approach based on comparative analysis of sets of genes with related functional roles. Some databases, such as KEGG Orthology (Kyoto Encyclopedia of Genes and Genomes), COGs (Clusters of Orthologous Groups), eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups), and Pfam (Protein Families and Domains), map DNA and RNA sequence reads and functional information into evolutionarily related families of similar biological activities, metabolic pathways, and/or protein structure and function. There are also databases that integrate information from multiple databases. For example, Entrez is the integrated search and retrieval system used by NCBI for assembling data from major life sciences databases, including literature sources (such as PubMed), nucleotide and protein sequences, protein structure, taxonomy, genome, expression, chemical, and other databases, and it makes the resulting combined information available to the public through a single platform (URL: http://www.ncbi.nlm.nih.gov). Similarly, UniProt is a European database that is comprised of four core databases for protein sequence and function.
One of the greatest challenges with having so much data and information available is that it is difficult for the databases to verify the input data. While some database resources try to maintain oversight, it is often left to the researchers who deposit the data to annotate and curate their data. This is not always a reliable way to ensure that the data are correct, so the end user must also be wary and take care not to use false data. Often, what happens is the researchers deposit large quantities of sequencing data, for which no annotation or curation has occurred. This is becoming ever more prevalent with the massive metagenomic sequencing efforts that are currently underway, such that the number of 16S rRNA gene sequences of uncultured microorganisms has already far surpassed the number of cultured microorganisms. Defining taxonomic thresholds for separating cultured bacteria and archaea based on 16S rRNA gene sequence comparison is already challenging in terms of the diversity observed, even when nearly the entire gene is used. Taxonomic classification of uncultured bacteria and archaea, particularly using information obtained from only partial gene sequences (short reads) typically obtained through large-scale metasequencing platforms, is currently one of the most fundamental problems in microbiology.
To deal with this issue of sorting microorganisms into defined taxa at the species, genus, or higher taxonomic ranks using strictly molecular genetic criteria, some databases have the capability of matching sequences from a sequence library obtained from known, well-characterized bacteria or archaea (so-called type strains), which allows one to link taxonomy with phylogeny. One such database is the Ribosomal Database (RDB, URL: http://rdp.cme.msu.edu), which provides online data analysis, alignment, and annotation of bacterial and archaeal small-subunit 16S rRNA gene sequences and fungal 18S rRNA gene sequences from a sequence library of only type strains or from the entire collection of sequences regardless of annotation (type and non-type strains). In addition, the RDB provides alignments for sequence comparisons and phylogenetic analysis that incorporates information from the conserved secondary structure of 16S rRNAs, which enables improved comparisons of short partial sequences and handles some artifacts that might arise from large-scale sequencing.
But the best way to experience the amazing power of bioinformatics is to try it for yourself. Go to the Entrez site or another database and type the name of your favorite protein. You will be amazed at the depth of information that is available about this protein: which species produce it; phylogenic relationships of the protein in different species; its structure (often done by different methods and perhaps even bound to ligands); the possible functions of its domains and how these domains relate to related domains in other proteins; how its expression is regulated at the gene and activity levels; where it fits in metabolism or cellular processes; signal transduction pathways that impinge on it; and on and on. The total amount of new biological information may seem daunting, but you can best appreciate the new depth of the current biological revolution by plunging in and looking for yourself. Besides, the structures and relationships are truly beautiful—and it is all free!
The realization that sequencing the 16S rRNA genes can be used to rapidly and accurately identify bacterial strains introduced a new era in bacterial detection and identification. Because the primers recognize conserved regions of the 16S rRNA gene, which are universal in bacteria, this approach can be used to rapidly identify bacteria that are not amenable to cultivation. The revolution first hit in environmental microbiology, for which nothing equivalent to the detailed identification protocols of clinical microbiology existed and the vast majority of the bacteria could not be cultured using conventional media. One of the first successes of this approach in clinical microbiology was the identification of the bacterium that causes a rare form of intestinal disease called Whipple’s disease. A bacterium-like form could be seen in tissues of infected people but attempts at cultivation had been unsuccessful. Finally, using this technique, the Gram-positive bacterium associated with Whipple’s disease was identified as Tropheryma whipplei.
The 16S rRNA gene sequence profile of the microbial community present in a sample can be represented as a phylogenetic tree, such as that illustrated in Figure 5-2, which shows the evolutionary relationship of the bacteria to each other based on sequence similarity of the reads. Each branch point (or node) is a taxonomic unit that represents the most recent common ancestor of the descendants. The lengths of the branches represent estimated sequence similarities or relationship distances from each other, from which estimates of evolutionary time can be inferred. An operational taxonomic unit (OTU) is a term used to group closely related organisms based on their sequence similarity of a specific taxonomic gene, such as the 16S rRNA gene. Usually when analyzing microbiotas, the researcher sets a threshold sequence similarity of 97%, 98%, or 99% to define a cluster of OTUs, but this can vary and may be influenced by errors in DNA sequencing.
Figure 5-2. Phylogenetic trees to show relationships among microbial communities. Phylogenetic trees are used to illustrate ways rRNA gene sequence data can be displayed to show inferred evolutionary relationships among microbial communities based upon sequence similarities of the microbes with each other. (A) Shown are the steps used to generate a phylogenetic tree based on PCR amplification and sequencing of the nearly full-length 16S rRNA gene sequences (8–1,492) using the commonly used bacterial primer pair (27F + 1492R). (B) The phylogenetic tree shown here displays the phylogenetic relationships among the bacteria found in the vaginas of healthy women. The scale bar represents 0.02 nucleotide substitutions per site in the 16S rRNA gene sequences. (C) Shown is an example of a dendrogram (tree) of the phylogenetic relationships of the bacterial communities from seven different human vaginal microbiota samples, based on their 16S rRNA gene sequence profiles (similar to the one shown in panel B). The lines denote the phylogenetic distance between each of the samples, which is a measure of the relationship or similarity of one sample to the other. For example, samples #4 and #5 are about 10% different from each other (the lines, or branches of the tree, converge at around 5% on the bar index at the bottom), whereas samples #6 and #7 are about 40% different from all of the other samples (#1–#5).
Another common method for depicting phylogenetic relationships or similarity (in terms of distance from each other) among microbial profiles is to use mathematical ordination methods, which are multivariate algorithms in which species units (taxa) or distance profiles of communities are clustered (or ordered) along gradients. One such method, called principal component analysis (PCA), generates a PCA plot from a microbial community data matrix, in which rows are taxa (or relative abundance or similarity distance) and columns are samples (Figure 5-3A). In a PCA plot, the observed data are mathematically transformed into a coordinate system (x,y-plot or x,y,z-plot), such that the data are projected to fit within the plot, where each axis (coordinate) represents a variable called a principal component (PC). The x axis (PC1) represents the component with the greatest variance (i.e., has the factor 1 that accounts for most of the variability in the data) and the orthogonal (perpendicular) y axis (PC2) represents the component with the second greatest variance (i.e., has another factor 2 that accounts for the next highest variability in the data). As for a phylogenetic tree, the closer the points are clustered together, the more similar the sample profiles are to each other (Figure 5-3B).
Figure 5-3. Principal component analysis to show relationships among microbial communities. A principal component analysis (PCA) plot is a tool for identifying patterns in data by highlighting their similarities and differences. The PCA plot is generated by a statistical program that uses orthogonal (perpendicular) transformation to convert the observed data (which fall within the area shown) into an ordinate plot. The first principal component (PC1) accounts for the largest variation in the data, which is attributed to factor 1, and the second principal component (PC2) accounts for the second-largest variation in the data, which is attributed to factor 2. Shown is a PCA plot (A) and the corresponding radial phylogenetic tree (a phylogenetic relationship tree that does not make assumptions about ancestry, in contrast to the trees shown in Figure 5-2) (B) based on 16S rRNA sequence profiles depicting the relationship among the vaginal microbiotas of humans and nonhuman primates. Vaginal samples are color-coded to match their host species. Data from Yildirim S, Yeoman CJ, Janga SC, Thomas SM, Ho M, Leigh SR; Primate Microbiome Consortium, White BA, Wilson BA, Stumpf RM. 2014. ISME J 8:2431–2444.
In this way, 16S rRNA gene-based profiling can be used to characterize the content of complex microbial populations, such as those found in the colon or vagina, and enables researchers to compare the relative compositions of microbial populations within an individual and among multiple individuals. Microbiota profiles from individuals who are healthy should resemble one another (i.e., be more closely related in terms of composition), whereas profiles from individuals suffering from a particular type of infection or disease should likewise resemble each other, but be different from the profiles of healthy individuals. Consequently, comparative analysis can serve to distinguish microbiota profiles among individuals and provide some diagnostic assessment of an individual’s health status.
Cultivation-based studies took many years to complete a census for the human vaginal tract and came to the conclusion that Gram-positive Lactobacillus species were the dominant bacteria (present at over 90%) in most healthy women. It was believed that this Lactobacillus-dominant composition only changed during overt infection or symptomatic disease. The cultivation-independent studies using 16S rRNA gene-based microbial content analysis of the human vaginal tract showed no big surprises (Figure 5-2). Lactobacillus species were still the numerically dominant bacteria present in the population (at 80–90%), with the minor bacterial components being mostly other Gram-positive Firmicutes, such as Staphylococcus and Streptococcus species, Gardnerella vaginalis, and a few Gram-negative Proteobacteria, such as E. coli and Pseudomonas species. However, the phylogenetic DNA-based analysis did reveal a number of species-level differences in Lactobacilli content from the outcomes of cultivation-based studies, as well as considerable person-to-person, species-level variation in the non-Lactobacilli bacteria present. One finding that challenged some previous assumptions about what constitutes vaginal health was the finding that G. vaginalis, which previously was thought to be only associated with overgrowth during symptomatic bacterial vaginosis (BV), is actually a common constituent of the normal microbiota of asymptomatic, apparently healthy women.
One question that a researcher might ask in developing an animal model of disease that manifests with a shift in the normal microbiota composition is if animals, such as laboratory rodents that are often used as models for human disease, have a microbiota that is similar to that of humans. And, if not, would another animal such as a nonhuman primate serve as a more appropriate animal model? To address this issue, a researcher might compare the 16S rRNA gene profiles of the microbiota at particular body sites in potential animal models. An example that may help you appreciate this decision process is provided by a study to determine whether baboons could serve as an appropriate animal model for studying the microbiota of the human vaginal tract.
Why baboons? The reason for choosing baboons is that anatomically, hormonally, and reproductively, the baboon and human uteruses and vaginas resemble each other closely, and both are clearly different from that of rodents, such that mice or rats might be considered to serve as poor models for human vaginal disease. Baboons have been widely used as an animal model in gynecological and reproductive studies of the female genital tract. Topics of these studies have ranged from endometriosis to the efficacy of birth control methods to menopause. Baboons are easier to house and more accessible to researchers than chimpanzees, another candidate for an animal model. Since the captive animals have the same diet and the same environment, this study also provided a chance to ask whether the considerable individual-to-individual differences seen in the human subjects were due to differences in environment, genetics, or other factors. However, in contrast to the microbiota of the human and rodent vaginal tracts, no culture-based analysis of the baboon microbiota had ever been done previously.
The 16S rRNA gene analysis of the baboon microbiota yielded a totally unexpected finding: the microbiota of the baboon vaginal tract was quite different from that of the human vaginal tract. Whereas Lactobacillus species dominate the human vaginal tract with minor components of other bacteria, species of Gram-positive bacteria (Firmicutes) other than Lactobacillus dominated the microbiota of the baboon vagina: mostly Clostridia, but also Gram-negative Fusobacteria and members of the phylum Bacteroidetes. This difference is illustrated in the clustering analysis shown in Figure 5-4. The difference between the composition of the human and baboon microbiota is striking, particularly in view of the fact that baboons seem to be so closely related to humans. Most of the bacterial species found in the baboon vaginal tract have representatives related to those isolated from humans, but largely those isolated from the human mouth and colon rather than the vaginal tract. Even within these groups, however, the human sequences clustered independently from the baboon sequences, indicating that, in many cases, the baboon sequences were not closely related to the human sequences and might represent new genera.
Figure 5-4. A radial phylogenetic tree showing the 16S rRNA gene sequence profiles of human and baboon vaginal microbes clustering to six major taxonomic groups. This radial tree shows relationships based on near-complete 16S rRNA gene sequences from the vaginal microbial community of captive baboons (red) and published human sequences (blue). Results show that the vaginal microbiota were clustered to six major taxonomic groups. Although Gram-positive bacteria dominate the vaginal microbiota of baboons, as they do in humans, there are major differences between the vaginal microbiotas of humans and baboons. Unlike humans, Lactobacillus species did not dominate the baboon microbiota. Even within the same taxa of bacteria, the baboon sequences clustered separately from the human sequences. Reprinted from Rivera AJ, Frank JA, Stumpf R, Salyers AA, Wilson BA, Olsen GJ, Leigh S. 2011. Am J Primatol 73:119–129, with permission.
One question that immediately comes to mind considering these results is if the vaginal microbiotas of other nonhuman primates besides baboons are also different from that of humans. Indeed, a comparative 16S rRNA gene-based analysis of vaginal bacteria clearly demonstrated host species-specific vaginal microbial communities among humans and nonhuman primates (see Figure 5-3). Humans were distinct from other primates not only in microbial composition and diversity, but also because only humans possessed Lactobacillus-dominant vaginal microbiota. The reason for this interesting difference is still under investigation.
While phylogenetic relationship analysis based on 16S rRNA gene sequences has some clear advantages and can provide valuable insights regarding the microbiota composition, there are important limitations. In practice, given the complexity and diversity of the populations found in most parts of the body, a 16S rRNA gene analysis provides at best a representation of the most abundant genera and species in a particular site. This information, nevertheless, is extremely valuable because it narrows down the possible groups of bacteria in the population and can guide cultivation efforts. For example, the possible presence of anaerobic bacteria such as Prevotella species means that anaerobic conditions should be included in any attempt to cultivate members of the dominant groups.
A more serious limitation of this approach is that the data from a 16S rRNA gene analysis are only semi-quantitative. Part of the problem is that endpoint PCR (see Figure 5-1B) is not strictly quantitative. This is due to the fact that some sequences seem to be amplified more efficiently than others (often referred to as PCR bias). Experience has shown that even some major members of the population may be misrepresented due to the absence of a strict quantitative representation of the microbiota. Nonetheless, the analysis gives an idea of what the leading members of the population are and provides a general assessment of their relative abundance.
Quantitative real-time PCR (qPCR). Once the members of a microbial community have been identified, the relative representation or abundance of those bacterial species can be determined by another method, called quantitative real-time PCR (qPCR). In one common variation of qPCR (Figure 5-5), genomic DNA from the microbial community is prepared and used directly as the template in PCR amplification reactions containing primer pairs that anneal specifically to the 16S rRNA genes of the bacterial species of interest in the population. The course of the PCR reaction is followed by an increase in fluorescence caused by intercalation of a dye, such as SYBR green, into the double-stranded PCR products. Alternatively, sequence-specific DNA probes labeled with the fluorescent dye at one end and a fluorescence quencher moiety at the other end can be used to detect the DNA of interest after hybridization of the labeled probe with its complementary sequence. The quencher is removed by the 5′ to 3′ exonuclease activity of the thermostable DNA polymerase, which allows for detection of fluorescence from the probe bound to its complementary DNA. Newer probe designs allow for enhanced specificity and sensitivity of detection, and when coupled with different fluorescent labels enable simultaneous detection of multiple target DNAs (multiplexing).
Figure 5-5. qPCR used to quantify specific bacteria in complex samples. Unlike conventional PCR, quantitative real-time PCR (qPCR) allows for detection of the target DNA in real time during the amplification process by measuring the fluorescence intensity above a given threshold. (A) Unbound fluorescent dye (typically SYBR green) does not fluoresce in the presence of ssDNA template, but does fluoresce when bound to newly formed dsDNA during the amplification process. (B) To eliminate fluorescence due to nonspecific binding of fluorescent dye, fluorescently labeled probes complementary to the target DNA template have quenchers attached that prevent fluorescence emission until the probes are degraded by the 5′ to 3′ exonuclease activity of the polymerase during the PCR amplification reaction. (C) Newer probe designs include fluorescence quenchers attached to the probe. Shown are two examples of such fluorescent dye-quencher probe designs: one as a single probe with step-loop structures to keep the fluorescent dye and quencher together and the other one as a duplex with one strand having the fluorescent dye and the other complementary strand having the quencher. The quenchers are removed during the second amplification cycle, such that the fluorescently labeled probe sequence can then hybridize to the complementary sequence on the newly synthesized target sequence. The blocker prevents the polymerase from extending the PCR primer. The fluorescence detected is directly proportional to the amount of target DNA present in the sample. (D) For qPCR analysis, the fluorescence is plotted versus the number of PCR thermocycles on a logarithmic scale, where the number of cycles at which the fluorescence is detected above the threshold is referred to as the threshold cycle (Ct), which is proportional to the amount of target DNA template in the sample.
For quantification, the qPCR procedure determines the kinetics of the increase in fluorescence intensity after each round of PCR amplification—this is the “real-time” aspect of the method—and relates these kinetics to a parameter called the threshold cycle number (Ct), which is inversely proportional to the starting concentration of the template DNA. Computer software analysis programs are then used to calculate the relative concentrations of each 16S rRNA gene, which is proportional to the relative number of bacteria of each species in the starting microbial community. Relative quantification of target DNA can be determined based on comparison of the Ct obtained with that of an internal reference bacterium. Absolute quantification of a target bacterium in a sample can be determined by comparison of the Ct obtained with DNA standards using a calibration curve generated from titration of serial tenfold dilutions of a known concentration of the target bacterium.
Ultra-High-Throughput, Massively Parallel DNA Sequencing. A revolution has occurred within the last decade that has had a lasting and profound impact on profiling of microbial communities and on microbial functional genetics in general. Technological advances in ultra-high-throughput, massively parallel sequencing have taken place, which allow the simultaneous determination of multiple millions of base pairs of DNA sequences in single reaction runs (hence the term “massively parallel”) at very reasonable costs. New bioinformatic methods now also allow the rapid and accurate assembly of these sequences into large regions of overlapping sequences (called contigs) that can be mapped into a complete genome.
At this writing, Illumina sequencing technology is the prevailing platform for massively parallel sequencing (Figure 5-6A). Despite the limitation of generating rather short read lengths of 50–300 contiguous bases, this method is extremely powerful and inexpensive in terms of capacity and time per sequencing run. At the time of this writing, the Illumina chemistry yields robust sequence determinations, even of long stretches of repeated bases, in as little as 6 hours for data outputs of 0.5–15 gigabases (1–25 million reads) to a few days for outputs of 7.5–35 GB (100–200 million reads). Meanwhile, improvements in DNA sequencing reagents and read detection technologies (Figure 5-6B) are appearing at an astounding pace. Modification to the original Illumina process for library preparation now enables paired-end sequencing (Figure 5-6C), which significantly improves alignment of reads to produce longer high-quality contigs that can be used for genome assembly.
Figure 5-6. Illumina method of massively parallel DNA sequence determination. (A) Shown are the steps in the Illumina platform. (B) Alternative sequence readout can be performed using the less expensive Ion Torrent technology, which detects protons (H+) released during the nucleotide incorporation reaction step as a change in pH for each H+ released during the cycle. (C) Paired-end sequencing entails sequencing both ends of a DNA fragment of defined length (which can be longer than the sequence read), allowing improved sequence alignment of reads during genome assembly. With a small modification of the library preparation process, it is possible to read both the forward and reverse template strands of each cluster (spot), which provides positional information and improved alignment of the paired-end reads with a reference sequence during assembly. Adapted from copyrighted publications from Illumina, with permission.
An alternative sequencing platform that generates much longer sequence reads and is ideal for de novo genome sequencing is the single-molecule, real-time (SMRT) sequencing platform developed by Pacific BioSciences (PacBio). The PacBio SMRT system takes advantage of the intrinsic speed, fidelity (3′ to 5′ exonuclease activity), and importantly the high processivity of a single molecule of a special strand-displacing DNA polymerase, originally obtained from Bacillus subtilis phage phi29, to generate very long reads from a single molecule of DNA template (Figure 5-7). At this writing, each SMRT cell reaction run yields up to 5 GB of sequencing data in about 12 hours. Current chemistry and library preparation advances have pushed the limits of sequence read lengths up to 60 kB (with average read lengths of 10–20 kB), which can yield large overlapping contigs of up to 15 MB in length, enabling the complete assembly of most microbial genomes, including plasmids, without gaps.
Figure 5-7. The PacBio SMRT sequencing platform. The Pacific Biosciences (PacBio) single-molecule, real-time (SMRT) sequencing technology uses a single highly processive, strand-displacing DNA polymerase that is immobilized to the base of a matrix in a fluorescence-detection well. (A) Each SMRT reaction cell contains 150,000 zero-mode waveguides (ZMWs), which are nanophotonic confinement wells that are illuminated from below and have a detection volume of 20 zeptoliters (10-21 liters). The technology uses ZMWs to observe the base incorporations of an anchored polymerase molecule, which allows light to illuminate only the bottom of the well, in which a single DNA polymerase plus DNA template complex is immobilized. (B) The current version of the SMRT technology uses SMRTbell template preparation to generate a circularized, double-stranded DNA template from appropriately sized fragments of the genomic DNA, generated either by random shearing or by amplification of target regions of interest. Universal hairpin adaptors are then ligated onto the ends of the DNA fragment to generate the SMRTbell library. These hairpin dimer sequences are removed at the end of the library preparation protocol. The sequencing primer is then annealed to the SMRTbell template, followed by binding of the DNA polymerase, to form a complex, which is then immobilized to the bottom of the ZMW well. (C) In each ZMW well, a single molecule of the DNA polymerase sequences a single nucleotide at a time of a single molecule of DNA template. Phospholinked nucleotides, each labeled with a different fluorescently linked dye, are introduced into the ZMW well. As a nucleotide binds the polymerase-DNA complex in the ZMW well, the fluorescently labeled phospholinked nucleotide emits a light pulse that is detected. When the phosphodiester bond is cleaved during the DNA elongation reaction, the fluorescent dye is released and diffuses away from the ZMW detection zone and the light emitted by that nucleotide is no longer detected. The fluorescent read output from incorporation of the four different fluorescent dyes is captured for the entire SMRT cell over time and processed by a computer into sequence reads (one read per ZMW well). (D) With the latest chemical reagent kits, SMRT technology can provide real-time sequence reads of 5–60 kB in length. Adapted from copyrighted publications from PacBio, with permission.
Currently, the Illumina and PacBio sequencing platforms are often used in combination to determine the complete genome sequence of a new bacterial isolate. If a complete genome is already available for a closely related bacterium such that that genome can be used as a template, then sequence assembly of overlapping contigs from a new bacterial isolate can readily be completed without any or only a few gaps remaining to be filled. For this, the shorter reads obtained by the Illumina platform are sufficient. For de novo genome sequencing (where no existing complete genome is available for a related bacterium), the longer reads of the PacBio platform provide a scaffold and high-quality draft sequence that can be rapidly polished by the high accuracy and coverage (i.e., the large number of overlapping sequence reads used to accurately call a sequence) provided by the Illumina sequencing. As these DNA sequencing methods develop, it will become increasingly quick and cost-effective to not only take the census of bacterial species in complex microbial communities, but also to sequence parts or entire genomes directly from the mixture of genomic DNA isolated from these populations.
To understand the power of current whole-genome sequencing technologies, let us consider a common, recurring problem in bacterial genetics. Interesting mutations often arise spontaneously in bacteria whose complete genomes have already been determined. Researchers are often interested in knowing what these mutations are, particularly if the bacterium is a pathogen. Classical bacterial genetics provides exceedingly clever ways to map mutations so that they can be located by conventional sequencing of a limited region of the chromosome of the mutant strain. However, these classical methods are often time-consuming and are far from foolproof, especially in bacterial species lacking powerful genetic systems, such as many bacterial pathogens.
For example, the Illumina sequencing technology is routinely used to locate mutations in bacteria, such as Streptococcus pneumoniae, whose genome contains about 2.2 million base pairs. In this case, chromosomal DNA isolated from mutant strains is sheared into random fragments of about 500 base pairs. Adaptors required for hybridization during the sequencing method are ligated to the ends of the DNA fragments, and the resulting products are amplified by PCR to give random libraries of genomic fragments from each mutant. The adaptors used for each mutant genome have slightly different sequences (“barcodes”), so that DNA sequences from more than one mutant can be sequenced simultaneously and later distinguished. The resulting barcoded libraries are mixed and subjected to Illumina sequencing. Currently, with the highest-capacity instruments, such a sequencing run yields about 120 billion bases of sequence comprised of approximately 400 million reads with lengths of up to 300 bases! Thus, the base-pair coverage of a single S. pneumoniae genome in a reaction lane would be about 27,000-fold in this example. This extraordinarily high level of coverage means that barcoded, fragmented genomic libraries from 27 different mutants could be sequenced simultaneously at 1,000-fold coverage in a single sequencing run in less than a week. Subsequent bioinformatic sequence comparisons with the known genome sequence of S. pneumoniae could then reveal the locations of single-nucleotide point mutations, small and large deletions, or even regions of chromosomal duplication that contribute to the mutant phenotypes. And what is the cost? At this writing, the cost of the sequence determination and bioinformatic analyses per mutant genome is far less than the cost of classical genetic approaches, which previously required weeks to months to obtain and, if they worked, would yield far less information compared to less than one week for the large-scale sequencing analysis.
16S rRNA Gene-Based Profiling of Complex Microbial Communities. Scientists are often interested in understanding the effect of certain conditions or factors over time on composition of the microbial community. For instance, a researcher might be interested in the effect of diet, hormones, or age on the composition of the microbiota, or how antibiotic therapy impacts the microbiota. To accomplish this, it is necessary to collect at any given time a sample of the microbes that is representative of the whole community (a profile) such that changes in abundance or diversity can be monitored.
One profiling method involves DNA microarray chips, called phylochips, which are comprised of tens of thousands of oligonucleotide-containing spots, each corresponding to a set of taxa-defining sequences, including rRNA gene regions and other unique gene sequences that can distinguish among the various microbial species (taxa or “phylotypes”) present in the sample of interest (Figure 5-8). To design appropriate phylochips, the microbial species present in environmental niches, including different areas of the human body, must first be identified by rRNA gene sequencing methods from a representative sample. Oligonucleotide probes that will hybridize to the rRNA genes from each species are then printed onto the microarray. These phylochips can be used to monitor shifts in microbial community compositions in environmental and clinical samples. The greatest advantage of this phylochip technology is that all microorganisms of interest in an entire community, not just bacteria, can be detected in a single assay by multiple probes to give reliable taxonomic information.
Figure 5-8. Phylochips for microbial-community profiling. Current phylochip platforms are a microarray-based method that taxonomically identifies microbes in a given sample through hybridization of the rRNA gene in that sample to all nine of the variable regions of the 16S rRNA gene. (A) General procedure for making a phylochip. Complete genomes are retrieved from the National Center for Biotechnology Information microbial database and a phylogenetic relationship tree is generated. Genome comparison of bacteria within each clade (group of related bacteria) is used to identify core gene sequences that are shared by most members of the clade. After sequence alignment with all other bacterial and archaeal genomes, core gene sequences that are unique to each target clade (species) are identified, and DNA probes are designed against up to 10 of the unique genes for each target species. (B) Example of how a phylochip could be used to distinguish microbes at the genus and species levels in a sample. Positive and negative controls are included to ensure that the hybridization steps worked and that there is no background detection, respectively. Some spots are probes targeted toward distinguishing specific microbes at the genus level (nine spots with probes for each of the V1–V9 variable regions of the 16S rRNA genes), while other spots are probes targeted toward distinguishing specific microbes at the species or subspecies level (spots corresponding to unique genes found only in particular clades or subspecies). Custom probes (upper left corner of figure) can be made for identifying species with particular genes present (e.g., virulence factors). (C) Heat map showing the intensities for individual probes (red) and total species probes (green) in the bacterial sample. A plot of the area under the curve (AUC) shows the relative abundances of each of the bacterial species in the sample.
The current limitation of these phylochips is that they first require the availability of the rRNA gene and other key distinguishing gene sequences for identification of each of the microbes expected to be present in the samples. But the increasing number of powerful high-throughput sequencing technologies currently available at ever more affordable costs, as well as the vast number of microbial sequences already deposited into the DNA databases, makes the design of phylochips for various microbial and clinical ecosystems increasingly feasible. Indeed, the breathtaking advances in massively parallel DNA sequencing methods described previously now allow cost-effective studies of the changing dynamics of complex microbial communities over time and under different conditions by direct sequencing alone.
Comparative 16S rRNA Gene-Based Profiling of Complex Microbial Communities. Current massively parallel sequencing technologies allow researchers to sequence millions of 16S rRNA genes per sample and multiple samples simultaneously, enabling microbial profiles for multiple samples to be rapidly generated and compared. Because of the depth of sequencing obtained from these large numbers of reads, these profiling methods enable researchers to rapidly assess differences in patterns between complex microbial communities, thereby gaining insight into the diversity or dynamics of the microbial communities.
It should be noted, however, that only nearly complete 16S rRNA sequences provide accurate measures of taxonomic identity and diversity within a microbial community. Unfortunately, while current sequencing technologies have made microbial community profiling routine and affordable, this has come at the expense of read length: sequence reads typically cover only a portion (usually 250–600 nucleotides) of the ∼1.5-kB 16S rRNA gene, necessitating that researchers choose shorter regions, such as V1–V3, V3–V4, V4–V5, or V3–V5 (see Figure 5-1A), for sequence analysis. Consequently, this compromises the resolution of the taxonomic identification and phylogenetic relationship determinations and should be taken into account when interpreting microbial sequencing data.
Despite the previously mentioned limitations, profiling of microbiotas based on shorter variable regions of the 16S rRNA gene using current sequencing technologies has proven invaluable in providing insights into the role of microbiota in health and disease and how our microbiotas change over time and under different conditions. For instance, such profiling analysis using species-level classification has been used to compare the vaginal microbiota of women of African-American versus European ancestry (Figure 5-9). The results revealed significant differences in the microbiota composition of these two groups of women. The profiles confirmed that women of European ancestry are more likely to have a Lactobacillus-dominated vaginal microbiota, whereas African-American women are more likely to have a more diverse vaginal microbiota, similar to a composition often found in women displaying symptoms of BV. Even at the species level, significant differences were observed within the Lactobacillus genus, with women of European ancestry more likely to harbor L. crispatus, L. jensenii, and L. gasseri. These findings supported other studies indicating that African-American women are more prone to having diseases that are associated with BV, such as increased risk of sexually transmitted diseases and adverse pregnancy outcomes. Researchers are now using this information to understand the factors that dictate these differences and how to develop treatment options that take these differences into account.
Figure 5-9. Comparative 16S rRNA gene-based profiling of bacterial communities from women of African-American versus European ancestry. Relative abundance profiles of vaginal bacteria using species-level classification from (A) 960 African-American women and (B) 330 women of European ancestry enrolled in the Vaginal Human Microbiome Project. Each vertical line represents the bacterial composition profile from one vaginal sample, where the profiles are grouped by the dominant species into different profile types. The colors correlate to different bacterial species: yellow = Lactobacillus crispatus; green = Lactobacillus jensenii; pink = Lactobacillus gasseri; light blue = Lactobacillus iners; red = Gardnerella vaginalis; orange = uncultivated clostridial species associated with bacterial vaginosis type 1 (BVAB1); dark blue = Prevotella species. The black bars underneath the plot indicate microbial profiles from women diagnosed with bacterial vaginosis (BV). Reprinted from Fettweis JM, Brooks JP, Serrano MG, Sheth NU, Girerd PH, Edwards DJ, Strauss JF 3rd; Vaginal Microbiome Consortium, Jefferson KK, Buck GA. 2014. Microbiology 160(pt10):2272–2282, with permission.
Tracking Microbes Through Space and Time by Multilocus Sequence Typing (MLST) and Whole-Genome-Sequencing. We have considered the problem of identifying different bacterial species in complex mixtures taken from the environment or from sites in the human body. But scientists and epidemiologists are often faced with the problem of distinguishing different isolates or variants of a single bacterial species. Suppose that there is an outbreak of Staphylococcus aureus in a hospital, and you need to trace how this particular strain got into and around the hospital. Was it carried by a hospital worker or by a family member of a patient? Or suppose there is an outbreak of food poisoning caused by Listeria monocytogenes, which can contaminate food-processing equipment and ready-to-eat meat products. You want to find out how this strain of L. monocytogenes entered the food chain and whether it is similar to strains that caused previous outbreaks. The rRNA genes of all isolates of S. aureus have the same sequence, because the rRNA genes change very slowly within a given species. Likewise, the DNA sequences of the rRNA genes are nearly identical for different variants and isolates of L. monocytogenes.
Clearly, 16S rRNA gene-based strain typing will not work to trace infections such as these. But unlike rRNA genes, the DNA sequences of genes that encode housekeeping enzymes and virulence factors do change with time within a species. That is, isolates from S. aureus or L. monocytogenes from different locations or sources accumulate slight differences in the sequences in their housekeeping and virulence factor genes over time. This drift arises partly because the genetic code is degenerate. Recall that more than one codon can specify the same amino acid (e.g., there are six specifying leucine), and much of this redundancy occurs at the third positions of codons. Therefore, the DNA sequences of housekeeping genes can show variations in different isolates of a given bacterial species and still specify the same amino acid in the enzyme product. The drift in virulence factor genes often involves changes in amino acids compared to those in housekeeping genes, because the virulence factor genes are subjected to strong selective pressures during infection.
This variation within coding sequences is the basis of multilocus sequence typing (MLST), but rather than relying on genetic drift in just one housekeeping or virulence gene, the DNA sequences of regions in multiple (usually seven or more) genes are analyzed for each isolate. For each of the genes selected for a particular bacterial species, the different sequences are assigned as alleles. MLST analysis using multiple genes is easy and has become cost-effective recently. Bioinformatics is used to identify regions of seven or more housekeeping or virulence genes that show variations in a given bacterial species. Pairs of PCR primers are designed to amplify and sequence about 500 base pairs from each of these variable regions. The sequences of these multiple loci can then be compared among samples or with other isolates in databases to trace the relatedness of the different isolates locally and worldwide. This analysis can also take into account combinations of alleles in bacterial species that exchange genetic material and undergo horizontal gene transfer frequently.
Going back to our examples, samples collected from staff, visitors, patients, and locations in the hospital could be cultured to identify those that contain S. aureus, which is a common commensal bacterium that is easily identified on growth medium. The different isolates of S. aureus would be subjected to MLST analysis. Progeny or clonal isolates will have the same DNA sequences in most of the multiple loci, whereas strains from a different source may have loci with sequence variations. The resulting profile would indicate whether patients are infected with the same strain of S. aureus and where this strain may have arisen in the hospital. Similarly, MLST analysis can trace the sources of L. monocytogenes through the chain of food preparation of the outbreak mentioned earlier as well as previous outbreaks and thereby identify the root of the contamination problem.
MLST analysis does not distinguish whether multiple sequence differences observed between alleles is a result of multiple point mutations or a single recombination event. Consequently, MLST analysis does not assign a higher similarity value to sequences differing by a single nucleotide compared to sequences with multiple nucleotide differences. To determine phylogenetic relationships of closely related species with high clonal evolution, multilocus sequence analysis (MLSA) is used, in which the selected housekeeping and virulence gene sequences are first concatenated (i.e., virtually linked in tandem to each other) before performing comparative phylogenetic analysis. This process provides greater discriminatory power for determining phylogenetic relationships.
MLSA analysis has recently been greatly extended by using whole-genome sequence determinations, rather than sequences of a limited number of housekeeping or virulence genes. MLSA by whole-genome sequencing provides higher resolution for differentiating bacterial isolates and for tracking the evolution and spread of virulent and/or antibiotic-resistant bacteria. As an example of the impact that whole-genome sequencing has had on MLSA, let us consider carbapenem-resistant Klebsiella pneumoniae, which has emerged as a serious clinical problem in hospitals. Previously, the vast majority of resistant clinical isolates had been genetically characterized by MLST as a single multilocus type (designated as ST258), leading to the hypothesis that a single clone of ST258 was responsible for the global spread of these multidrug-resistant bacteria. However, whole-genome sequencing and phylogenetic analysis of a large number of ST258 clinical isolates from diverse geographic locations revealed unexpected diversity among the isolates, including divergence into two distinct genetic clades due to a 215-kB region that appears to be a hotspot for DNA recombination events. This region also contains genes involved in capsule polysaccharide biosynthesis that likely contributed to the observed serological variation. In another example, phylogenetic relationship analysis based on whole-genome sequencing of 132 clinical isolates from around the world was used to track the recent evolutionary history of the dysentery (bloody diarrhea) pathogen Shigella sonnei from Europe to other parts of the world (Figure 5-10).
Figure 5-10. Phylogenetic relationship analysis using comparative whole-genome sequence profiling to track the evolution and dissemination of the dysentery pathogen Shigella sonnei. The dysentery pathogen Shigella sonnei was once predominant in developed countries, but it is now emerging as a major problem in developing countries. Whole-genome sequencing of 132 globally distributed clinical isolates, followed by phylogenetic analysis, showed that the current S. sonnei strains descended from a common ancestor in Europe less than 500 years ago. The results also showed that by the late 19th century, S. sonnei had diverged into four distinct lineages with strong regional clustering. The heat map shows the distribution of genes associated with antibiotic resistance. Known antibiotic-resistance mutations in the gene encoding DNA gyrase, GyrA, are indicated by color. Probable multidrug-resistance (MDR) gene acquisition events are boxed in red. Geographically localized clonal expansions are highlighted with their median estimated divergences dates. Reprinted from Holt KE, Baker S, Weill FX, Holmes EC, Kitchen A, Yu J, Sangal V, Brown DJ, Coia JE, Kim DW, Choi SY, Kim SH, da Silveira WD, Pickard DJ, Farrar JJ, Parkhill J, Dougan G, Thomson NR. 2012. Nat Genet 44:1056–1059, with permission.
Characterizing Microbiomes by Using Metagenomic Analysis
The various rRNA gene sequencing approaches mentioned previously only give information regarding what types of microbes are present in a community. A limitation of this type of approach is its failure to generate functional genomic information for deciphering the metabolic contributions of the microbes present in an ecosystem. For example, it does not provide direct information about which microbes might have a symbiotic relationship by producing and secreting metabolic products that could cross-feed other microbial species or the host. The current state of mega-scale DNA sequencing technology has spurred interest in a more ambitious approach to analyzing the body’s microbiota: determining the sequences of all of the individual microbial genomes of the body (the microbiomes). The goal of this approach, called metagenomic analysis, is designed to go beyond the question of cataloging what organisms are present (i.e., the census question) to the question of what the metabolic and physiologic potential of the microbiota is (i.e., the metabolic genes and pathways that are present).
But what can be done about species whose genomes are incomplete or not in the database at all? To answer this question, the next stage of metagenomic analysis involves isolation and mega-scale sequencing of all genomic DNA from an entire mixed microbial population (called the metagenome) so as to harvest the remarkable and vast diversity present. In fact, the advances in robotics, ultra-high-throughput, massively parallel sequencing, and bioinformatics assembly technologies are already leading to determination of complete genomes of microbes directly (without cultivation or cloning of individual isolates) from the mixed genomic DNA samples isolated from complex microbiota communities. For instance, metagenomic analysis can be applied to profiling strain-level variation in microbial communities (Figure 5-11).
Figure 5-11. Metagenomic profiling of strain-level variation in microbial communities. (A) Mapping paired-end sequencing reads to microbial reference genomes reveals not only the genomes that are present in a community, but also differences between the isolates of particular species and the reference isolate. In this example, most positions have 4x coverage, represented by four sequencing reads mapped to each position in the reference genome sequences from bacterial species A and B. Gene deletion events can be detected with relatively low coverage of the reference genome; no reads from the sample map to deleted genes (in orange). Higher sequencing coverage of the genomes facilitates differentiating between sequencing error and true nucleotide-level strain variation. Such variation includes fixed differences (in which the sample is consistently different from the reference at some site) and single nucleotide polymorphisms (SNPs; in which a site occurs in two or more states in the sample). Sequence reads that do not map together (blue reads from individual 1 and red reads from individual 2) indicate additional community variation, including the insertion of genomic material not found in the reference genome by mechanisms such as horizontal gene transfer (HGT). (B) Mapping reads to reference genomes can reveal patterns of gene presence or absence, which is a form of strain variation. Here, two individuals sampled at two time points (t = 0 and t = 1 year) are distinguished by the presence or absence of genes in species A. The blue genes are stably present in individual 1 and stably absent in individual 2, whereas the red genes are stably present in individual 2 and stably absent in individual 1. Adapted from Franzosa EA, Hsu T, Sirota-Madi A, Shafquat A, Abu-Ali G, Morgan XC, Huttenhower C. 2015. Nat Rev Microbiol 13(6):360–372, with permission.
Assembling individual genome sequences from many thousands of sequences is still challenging for existing bioinformatics programs, but here again, rapid advances are being made in the analysis of the huge volume of new sequence data that is emerging, and these advances are beginning to make what seems unimaginable today feasible tomorrow (in biotechnology, as in most modern scientific endeavors, “impossible” just means it is “not possible yet”). Interpretation of a metagenomic analysis is much more complex than 16S rRNA analysis, or even a whole genome assembly and analysis, and details are beyond the scope of this textbook, but for the adventurous, several examples of recent metagenomic analyses of human microbiotas are provided in the suggested readings. What these analyses allow us to do is have a glance at the types of biosynthetic and metabolic pathways that the microbes in any given population might have at their disposal to utilize.
To help tackle the daunting task of defining our microbiomes, the National Institutes of Health launched the Human Microbiome Project (HMP) in 2007 with the objective of sequencing the collective genomes of all of the microbes (bacteria, archaea, fungi, protozoa, and viruses) that comprise the microbiotas of five targeted sites in the human body: the skin, mouth, nasal passages/lungs, vagina, and gut. The goal of the HMP was to lay the groundwork for establishing what constitutes a healthy microbiota so as to understand how changes in microbial composition, in terms of diversity and richness of content, affect health and disease. A large part of the HMP was to establish a repository of reference microbial genome sequences that could be used for facilitating the interrogation of microbiomes.
Metagenomic analysis starts with the census information indicating which species are present in microbial populations, but then adds information about metabolic function gleaned from thousands of complete bacterial genomes that are already deposited in genome databases, largely established through the HMP effort (see Box 5-1). These resources can provide a vast amount of information about the metabolic potential of a particular microbiome through comparison of the metabolic pathway sequences present in that mixed bacterial population with those from annotated genomes of taxonomically related bacteria that have been collected through the HMP. Since most of the already completed genomes were sequenced from a single clone of that microbe, most of the existing metabolic pathways and gene functions have already been ascertained, either directly from biochemical analyses or by inference based on analogy to similar genes in other microbes.
Beyond the Metagenome
A shared limitation of the 16S rRNA gene and metagenomic analysis approaches is that they do not provide information about which genes are being expressed at any given time or what the functional activity of the community is under any particular condition. Since many bacteria that normally reside in or on the body may be expelled into the environment after traveling through the stomach and intestinal tract from their site of origin, it stands to reason that the site they normally occupy does not explain the presence of all of their genes. They have to endure stresses independent of the environment of the site in which they usually reside. Thus, only a subset of genes is likely to be expressed at any one time or at any particular site. Moreover, even within the same site, changes in conditions, such as changes in diet or hormonal levels, may cause increases or decreases in expression of certain sets of genes. Work is now underway to include additional information about the functional status of the microbial community by combining other types of functional information about gene, RNA, and protein expression levels, activity status, and flux of metabolite content.
RNA-Seq Profiling (Transcriptomics). Gene expression in complex populations can be measured using techniques that detect and quantitate mRNA levels, including qPCR (see Figure 5-5) and RNA-seq technology (Figure 5-12). The Illumina RNA-seq paired-end sequencing platform has become the method of choice for interrogating the abundance and diversity of RNA transcripts (transcriptomics). The first step of RNA-seq is to extract and purify total RNA from the bacterial samples that will be compared. Different RNA purification methods are used for mRNA and small RNAs (sRNA) of less than 100 nucleotides. Highly abundant rRNAs and tRNAs are removed, and the mRNA samples are physically fragmented into smaller pieces. Reverse transcription is performed to convert the RNA fragments into cDNA, and the resulting cDNA is ligated to adapters, where barcode sequences mark cDNAs from different RNA samples. The cDNA + adapters are then subjected to the library amplification and Illumina sequencing, as illustrated in Figure 5-6.
Figure 5-12. RNA-seq technology. The Illumina RNA-seq platform enables global profiling of the transcriptional responses of all genes in individual cells or tissues at considerable depth of coverage under multiple conditions or over time. (A) Steps in an RNA-seq experiment using Illumina sequencing described in the text and Figure 5-6. (B) Outcomes of aligning multiple separate, short (150–300 nucleotides) sequencing reads, which correspond to overlapping segments of mRNA or sRNA. Left, if a reference genome is not available for a bacterium, the separate overlapping reads can be aligned to show the length of a transcript. The transcript can then be analyzed for open reading frames (ORFs) or other features. Middle, if a reference genome is available, the aligned fragments can be compared to genomic features, such as ORFs, intercistronic regions, or predicted promoters and transcription terminators. Right, the number of reads for each nucleotide base in a transcript is proportional to the starting amount of mRNA or sRNA present in the sample. Hence, following normalization, relative transcript amounts can be determined in different bacterial strains or samples, such as in a wild-type bacterial strain compared to a mutant strain or a wild-type strain grown under an unstressed versus a stressed condition. See the text for additional details. (C) The number of nucleotide base reads (blue and red regions) is often constant across regions that are cotranscribed. Therefore, the read patterns across genes indicate monocistronic versus multicistronic operons, and also the presence of any sRNA, putative regulatory RNAs, or antisense RNAs. See the text for additional details. (D) An example of changes in relative transcript amounts determined by RNA-seq. An Escherichia coli strain was grown in medium lacking or containing a glucose analogue, α-methylglucoside (αMG). RNA-seq analysis showed that the relative transcript amounts of seven metabolic genes increased (left) and three metabolic genes decreased (right) in the bacteria treated with αMG compared to the untreated control. Independent quantitative reverse-transcription polymerase chain reaction (qRT-PCR) experiments confirmed the trend in expression changes detected by RNA-seq. Panel D adapted from McClure R, Balasubramanian D, Sun Y, Bobrovskyy M, Sumby P, Genco CA, Vanderpool CK, Tjaden B. 2013. Nucleic Acids Res 41(14):e140, with permission.
The output of the Illumina sequencing is millions of short-sequence reads of about 150–300 nucleotides that correspond to regions of transcribed mRNA and sRNA molecules in the original samples. Multiple samples can be sequenced simultaneously in each Illumina sequencing run, and sequence reads can be sorted by the unique barcode sequences used in the adapters ligated to cDNAs of each sample. Because mRNA is randomly fragmented in this procedure, a series of overlapping reads covering the length of each mRNA emerges, and each nucleotide base in the mRNA is determined multiple times in separate reads. The total number of reads of each nucleotide base is referred to as coverage, and for most mRNA molecules the coverage of each nucleotide base is approximately equal. However, keep in mind that the output of a single RNA-seq experiment is all of the separate mRNAs or sRNAs expressed in a bacterium.
Because of the immense volume of data acquired, computer bioinformatic analyses are required to align and display the separate mRNAs and sRNAs from RNA-seq experiments. From the separate reads, three kinds of analyses are possible (Figure 5-12B). If a genomic DNA reference sequence is unknown, the overlapping reads can be aligned to indicate the length and sequence of individual mRNA molecules, which may contain open reading frames (Figure 5-12B, left). If a reference genomic DNA sequence is known, then the RNA-seq data can be compared to annotated reading frames and intercistronic regions to indicate operon arrangements (Figure 5-12B, middle). These kinds of analyses also provide identification of the locations of monocistronic and multicistronic operons, likely sRNA and regulatory RNAs that do not seem to encode proteins, and regions of antisense transcription (Figure 5-12C).
Probably the biggest application of RNA-seq is quantitation of the relative amounts of gene transcripts in a wild-type bacterium compared to that of a mutant or to that of the wild-type bacterium subjected to a stress condition (Figure 5-12B, right). The basis of this method is that the amount of cDNA synthesized and sequenced using the RNA-seq technology is proportional to the amount of mRNA or sRNA in the initial samples (Figure 5-12A). That is, the number of nucleotide base reads for each mRNA or sRNA is proportional to the amount of mRNA and sRNA in the samples. More reads across a gene or operon relative to a wild-type standard or to control conditions indicate an increased transcript amount, whereas fewer reads indicate less relative transcription.
In quantitation experiments, total RNA is extracted from the two strains whose transcriptomes are to be compared (e.g., a wild-type versus mutant strain), prepared for RNA-seq with two sets of barcodes, and then subjected to Illumina sequencing. Normalization between samples is performed by summing the number of reads for each gene and dividing by the gene length and the total number of reads in that sample. The relative change in transcript amount can then be calculated and compared for each gene in the bacterium. An example is shown in Figure 5-12D of the application of RNA-seq to determine the genes in E. coli whose relative transcript amounts increase or decrease in response to the glucose analogue, α-methylglucoside (αMG). As a final step in this transcriptome analysis, relative changes in transcript amounts detected by RNA-seq were confirmed by the independent method of quantitative reverse-transcription PCR (qRT-PCR). RNA-seq has largely replaced tiled microarrays for transcriptome analyses in bacteria and other organisms.
Coupling of RNA-seq results from total mRNA of a microbial community (metatranscriptomics) with the total genomic DNA content (metagenomics) enables an estimation of which metabolic pathways and protein functions are expressed in a particular sample under certain conditions. In this approach, relative mRNA transcript amounts are determined by the RNA-seq methods described previously. This approach can provide an estimation of the metabolic potential and physiological makeup of that particular microbiota. For example, one study comparing the gut metagenome (i.e., the predicted metabolic genes present) with the corresponding metatranscriptome (i.e., the relative transcription levels of the metabolic genes present) revealed that certain pathways, such as sporulation and amino acid biosynthesis, were consistently under-expressed, whereas other pathways, such as ribosome biogenesis, stress response, and methanogenesis, were consistently overexpressed in relation to their DNA abundance. These findings are consistent with the presumed roles of these pathways under the conditions in the gut. For instance, you would expect an abundance of amino acids in the gut from the digestion of foodstuffs so that bacteria would not need to make their own, and thus amino acid biosynthetic genes would be downregulated. On the other hand, the rich nutrient environment of the gut would encourage bacterial growth, such that there would be a lot of protein synthesis activity and ribosomal subunit genes would be strongly upregulated.
Proteomics and Metabolomics—the Emergence of Multi-Omics. Even the advances in metagenomics and metatranscriptomics may not be enough to fully characterize the functional activity of microbial communities. Two genes in a microbial population may encode the same type of enzyme or transport protein. But different enzymes and different transport proteins have different affinities for their substrates and different levels of activity. Additionally, there may be regulatory factors that impinge on the observed functional activity. Thus, a gene that is expressed at a higher level than another gene may not be more important metabolically or physiologically. To address this need for functional understanding at a population level, other measures of functional potential or activity of the microbiota are necessary.
Measuring the relative abundance of protein and metabolite (small molecule) amounts provides another indicator of the functional activity of a microbial community. Two approaches are now taking off that enable the assessment of proteins and metabolites in biological samples. Proteomics allows assessment of relative protein amounts and stabilities and functional states (activation, inactivation, or modification) of the protein content of samples (Figure 5-13). Metabolomics allows the determination of the composition of all of the metabolites and other small molecules (usually any molecule with a molecular mass less than 1,500 Da) present in a sample (Figure 5-14). Both approaches used to determine protein or metabolite content and abundance depend on chromatographic separation methods, usually high-performance liquid chromatography (HPLC) or gas chromatography (GC), followed by detection, structural determination, identification, and quantification of the contents using analytical techniques, such as mass spectrometry (MS), nuclear magnetic resonance (NMR) spectroscopy, or some other type of spectroscopy.
Figure 5-13. Overview of proteomics technology. Proteomics provides a comprehensive assessment of the protein identity and relative abundance in biological samples. (A) In the simplest identification, proteins in extracts of a bacterial culture are first fractionated by liquid chromatography (LC) and SDS polyacrylamide gel electrophoresis. Individual bands are cut out from gels, and the proteins are digested into peptides with trypsin protease. The resulting peptides are fractionated by high-performance liquid chromatography (HPLC) that is connected in-line to a mass spectrometer detector. The mass/charge (m/z) ratio is then determined for the peptides in the HPLC fraction that can be detected by mass spectrometry. Computer programs are then used to generate in silico predictions of the m/z ratios of all of the tryptic peptides that correspond to proteins predicted from the bacterial genome. The computer then matches the predictions with the peptide profile to identify the unknown protein. The identity of tryptic peptides can be further confirmed by tandem mass spectrometry (MS-MS) analysis, where fragments of the peptides to allow further structural determination of the peptides. With the newest ultra-performance LC systems and mass spectrometer detectors, this analysis has been expanded to complex mixtures of proteins in crude extracts. The crude protein mixtures are digested by trypsin and analyzed by liquid chromatography-mass spectrometry. The presence of as many as 700 proteins can be demonstrated by this method. However, these methods are only semiquantitative for determining the relative abundance of proteins in samples. Nevertheless, protein interaction or regulatory networks can be built for various purposes, such as signaling pathways, biosynthetic pathways, or protein-protein interaction pathways from these proteomic approaches. (B) iTRAQ method to quantitate relative protein amounts in different samples. As with RNA-seq, a major goal of proteomics is often to determine how the relative amounts of proteins change during bacterial growth in a standard unstressed condition compared with stressed conditions or in a wild-type compared to a mutant strain. An ingenious method, called iTRAQ, has been developed for this purpose. In an iTRAQ experiment, protein profiles from as many as eight separate cultures or conditions can be compared. The separate extracts are denatured and then digested to completion with trypsin. Each extract is reacted with one of the eight available iTRAQ labels, which do not change the relative m/z ratio for any given tryptic peptide. However, when each iTRAQ-labeled peptide is fragmented by MS-2, the peptides from the different samples end up with slightly different m/z values that allow resolution and quantitation of the iTRAQ-labeled fragments. The relative amount of each of the different iTRAQ-labeled peptide fragments is directly proportional to the amount of each protein that contains the peptide in the original samples. Adapted from PTM Biolabs (http://ptm-biolab.com/itraq-proteomics), with permission.
Figure 5-14. Overview of metabolomics technology. Metabolomics technology enables a comprehensive analysis of the content and relative abundance of low molecular weight (5–1,500 Da) molecules or metabolites within a sample, which provides a better assessment of which metabolic pathways are active and what the relative flux through those pathways is. (A) The samples are processed using solvent extraction methods with various polarities that are specific for a particular type of molecule, such as lipids, sugars, or small organic molecules. (B) The extracted compounds are then separated from each other via chromatographic methods such as high-performance liquid chromatography or gas chromatography either directly or after chemical derivatization (to enhance solubility). This is followed by structural analysis using detection technologies, such as mass spectrometry or nuclear magnetic resonance spectroscopy. The data are then processed and analyzed using bioinformatics tools and comparison with existing data from known compounds in available databases to determine the structure and identity of the components and their relative abundance in the sample. The metabolites identified are then mapped onto cellular signaling and biosynthetic pathways. A comprehensive list of metabolomics databases and resources can be found at the website for the Metabolomics Society (http://metabolomicssociety.org).
Currently, comprehensive proteomic and metabolic profiling is sometimes a daunting undertaking, even in monocultures of one kind of bacterium. Each of the previously mentioned approaches add an extra level of technological and computational scale and complexity to the profile analyses, which make applications toward characterization of populations of bacteria in the microbiota very challenging. For example, while proteomics is one of the fastest growing areas of research today, the sheer number of protein variants possible for each gene product and its homologs in many bacteria from multiple phyla places a considerable burden on the bioinformatics required to appropriately annotate and classify the proteins evolutionarily based on sequence similarity.
Proteomic analysis is highly valuable in establishing whether a protein encoded by a gene is expressed at any given time. More sophisticated proteomic approaches (Figure 5-13) can determine relative amounts of proteins in a bacterium or population; therefore, proteomics can provide information regarding whether certain proteins have the potential to be functionally active in cells. However, this does not necessarily mean that the biochemical and cellular function of different variants have been established to be the same in all bacteria that harbor a homolog of a given gene, nor does it definitively establish the function of the expressed protein or proteins encoded by that gene. Moreover, the presence of a protein does not necessarily mean that it is functionally active, since it does not provide any information about whether its activity has been modulated through interactions with or modifications by other proteins or ligands in regulatory signaling networks or biosynthetic pathways.
Metabolomics also presents experimental challenges. It is not possible with today’s technologies to analyze all of the metabolites in a given sample by any single analytical method. In any given biological sample there are a wide range of primary and secondary metabolites involved in essential and nonessential metabolic and signaling pathways, including peptides, oligonucleotides, sugars, nucleic acids, ketones, aldehydes, alcohols, amino acids, amines, lipids, steroids, alkaloids, and other endogenously generated small molecules. In addition, numerous xenobiotics, such as drugs and antibiotics, are also often present in varying amounts in microbiota samples. All of these molecules have very different chemical and structural properties, including different functional features, hydrophobicity, acidity, redox potential, and reactivity. Consequently, multiple and different separation and detection methods must be applied in combination to comprehensively identify and quantify the enormously diverse metabolites in a single bacterium or combinations of bacteria.
Moving to the next level of complexity, a few intrepid researchers have begun to explore the application of these genomic and functional approaches toward the characterization of microbiota within the context of the host response and the interplay of host-microbe interactions. Importantly, an integrated multi-omics approach is critical for interpretation of the data obtained from each of the separate approaches to gain deeper biological insights. For instance, inclusion of gene copy number in a sample (obtained from genomic data) can be used to normalize the functional activity observed for a particular set of expressed genes with similar enzyme activities (obtained from transcriptomic, proteomic, or metabolomic data). A multi-omics approach also allows the researcher to confirm conclusions or hypotheses made from one set of data, such as the presence of a gene that has weak homology to a known gene in a known biosynthetic pathway that makes a particular metabolite (obtained from comparison of metagenomic data with databases) with correlative functional data obtained through other methods, such as the presence of the expressed protein (obtained from proteomic data) and/or the presence of the predicted metabolite (obtained from metabolomic data). Again, a comprehensive discussion of this complex topic is beyond the scope of this textbook, but we provide a few examples later and in the selected readings for those who are curious.
Overview of the Human Microbiota
A human fetus is devoid of microorganisms. Passage through the vaginal tract during birth begins the colonization process. Shortly after birth, the microbiota profiles of vaginally delivered infants resemble that of their mother’s vagina, while the microbiota profiles of infants delivered via Cesarean section more closely resemble that found on the mother’s skin (Figure 5-15). This process is further influenced by exposure to early environmental factors and continues as the infant grows. The final microbiota composition is not achieved until the child is about 2 or 3 years old. Once the microbiota of an area assumes its stabilized, steady-state form, different areas of the body harbor very different microbial populations. Even within a single site, such as the mouth, different areas may contain different sets of microbes (Figure 5-16). This diversity is not surprising in view of differences between conditions that microbes encounter in any particular site. Nevertheless, despite these site-to-site differences within an individual, microbial community compositions at particular sites are generally more similar among cohabiting family members, less similar among distant relatives, and vary considerably among unrelated individuals.
Figure 5-15. Birth delivery mode influences the microbiomes of newborn infants. Shown are pie charts of the genus-level bacterial compositions derived from 16S rRNA gene sequence analysis at different body sites for a group of mothers and their babies shortly after birth, grouped according to the delivery method of the babies: vaginal versus Cesarean section (C-section). Adapted from Reid G, Younes JA, Van der Mei HC, Gloor GB, Knight R, Busscher HJ. 2011. Nature Rev Microbiol 9(1):27–38, with permission; based on data from Dominguez-Bello MG, Costello EK, Contreras M, Magris M, Hidalgo G, Fierer N, Knight R. 2010. Proc Natl Acad Sci USA 107(26):11971–11975.
Figure 5-16. Phylogenetic relationships of microbiome profiles among human body sites. Phylogenetic comparison of 16S rRNA gene-based bacterial community profiles among different human body sites revealed strong clustering by body site, meaning community compositions varied significantly less within a particular body site than between sites. Each point in the principal-component analysis (PCA) plot in (A) and the dendrogram (tree) in (B) corresponds to the profile of a sample, colored according to particular body site. In these clustering analyses, 16S rRNA gene-based bacterial community profiles that cluster closer together are more similar to each other. Adapted from Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R. 2009. Science 326(5960):1694–1697, with permission.
There are, however, some features of the microbiota that are common to all sites of the human body colonized by microbes. First, the numerically predominant microbes are bacteria. Archaea, fungi, and viruses (including bacteriophage) are also present, but archaea and fungi are in much lower numbers. Second, the majority of the bacteria are Gram-positive bacteria. In fact, some of the microbial population shifts that are associated with such conditions as periodontal disease and BV involve a change from a predominantly Gram-positive to a predominantly Gram-negative bacterial population. Why the Gram-positive bacteria dominate the human microbiota is not clear. Finally, although the microbiota of different areas of the body is usually protective, many members of the microbiota can cause serious infections, even in healthy individuals, if they manage to breach the protective epithelial or mucosal layers and enter normally sterile areas of the body such as blood and tissue.
Skin Microbiota
The surface of the skin is a dry, slightly acidic, and aerobic environment. Early culture-based studies determined that Gram-positive staphylococci, such as Staphylococcus epidermidis, are the predominant bacteria occupying this site, although it is not uncommon to have transient colonization of the skin by soil bacteria or bacteria from other parts of the body. Skin bacteria like S. epidermidis were long assumed to be unable to cause disease. Finding them in a blood specimen, for example, was considered to be proof that it was introduced via a careless health care worker or a technician that had contaminated the specimen. Although the surface of the skin is aerobic, pores and hair follicles can be anoxic enough to support the growth of anaerobic bacteria, such as Propionibacterium acnes (see chapter 2). Today’s view of the skin microbiota is completely different. Comparison of 16S rRNA gene-based community profiles from skin at different locations showed not only high variability from person to person (interpersonal variation), but also high variability within an individual (intrapersonal variation) (Figure 5-16). While Propionibacteriaceae dominate the microbiomes of head regions (60–80%) and arms (20–40%), the torso and legs are dominated by Staphylococcus and Corynebacterium species. Skin sites also vary considerably in the bacterial diversity of minor bacterial species present. Indeed, it has been proposed that skin bacterial community profiles could serve as a valuable aid in forensic identification.
Oropharyngeal Microbiota
The most common source of S. aureus, a major cause of hospital-acquired and community-acquired infections, is the human nose, where it is most abundant in the upper part of the nasal cavity. At any particular time, about one-third of the human population harbors S. aureus. The nasopharynx is also home to S. pneumoniae, another human commensal bacterium that can cause a variety of serious invasive diseases elsewhere in the body. About 25% and 40% of healthy adults and children, respectively, carry S. pneumoniae at any given time. Lower in the nose, not surprisingly, are bacteria that are also found on the skin. Several large-scale efforts are currently underway to understand the composition of the microbiota and the dynamics of colonization of the nose.
The microbiota of the mouth and throat is fairly well characterized, largely due to the involvement of the oral microbiota in periodontal disease, a major cause of gum disease and tooth loss in adults. The microbiota of the healthy mouth consists largely of facultative Gram-positive bacteria, mainly streptococci such as Streptococcus mutans and Streptococcus salivarius, which are not completely innocuous. These bacteria ferment sucrose to lactic acid, which in turn contributes to the development of dental caries. Utilization of sucrose also results in production of the polysaccharide dextran, which binds bacteria together and allows plaque to form. In periodontal disease, this Gram-positive microbiota shifts to a Gram-negative anaerobic microbiota in the area of the gums. The space between the gums and the upper portion of the tooth surface is called the periodontal pocket. It is a fairly anoxic area and so is able to support the growth of obligate anaerobes, such as Porphyromonas gingivalis, Prevotella spp, Tannerella forsythia, and Treponema denticola. These species produce proteases and other tissue-degrading enzymes, and this may be a major cause of the inflammation that characterizes the disease. As we will discuss at length in chapter 6, P. gingivalis appears to act as a “keystone” or “trigger” pathogen by altering the normally benign composition of the commensal microbiota, leading to microbial dysbiosis and increased inflammation and periodontal disease.
More recently, scientists have found that bacteria involved in periodontal disease, such as the Gram-negative anaerobe Fusobacterium nucleatum, are responsible for some cases of preterm birth. The hypothesis is that the bacteria enter the bloodstream through the inflamed gum tissue and lodge in the placenta. The resulting inflammation causes the fetus to be delivered prematurely. Similarly, others have suggested that oral bacteria associated with gingivitis might enter the bloodstream, causing inflammation in the blood vessels and thus leading to heart disease. In fact, it is now established that some oral bacteria, such as Streptococcus sanguinis and Streptococcus pyogenes, are direct causes of endocarditis. These examples illustrate the new thinking about connections between alterations in the normal microbiota and the impact on diseases in other areas of the body. More examples can be found in the suggested readings at the end of this chapter.
Microbiota of the Small Intestine and Colon
The small intestine is characterized by the fast flow of contents (see chapter 2). The fast flow helps wash bacteria out of the site. Throughout most of the small intestine, fast flow of contents prevents bacteria from adhering to the mucosa in order to stay in the site. The human microbiota of the small intestine is poorly characterized, largely because it is difficult to obtain samples from that area. Samples are usually obtained through a swallowed tube that works its way into the small intestine. Although such tubes have been used, from a microbiological perspective they have a significant deficiency; namely, they sample the lumenal but not the adherent microbiota. In mice, where such studies have been feasible, there is an adherent microbiota of the small intestine that consists mostly of Clostridium species.
The lower intestine or colon, by contrast, has a much slower flow of contents (see chapter 2) and much higher concentrations of adhering and colonizing bacteria (Figure 5-17). Bacteria make up a third of the microbial content of the human colon. These microbes have a complex relationship with us. We provide them with ingested food and fermentable substances like mucins, while the bacteria further degrade complex, undigested foodstuffs into substances that contribute to our nutrition. The human small intestine can absorb small molecules such as mono- and disaccharides, and can digest soluble starch, but most of the complex polysaccharides in the human diet, such as cellulose, xylan, and less soluble starch, pass through to the colon. Colonic bacteria ferment carbohydrates, including these polysaccharides, to produce CO2, H2, and short-chain fatty acids (SCFAs; acetate, propionate, and butyrate), which are absorbed by intestinal cells and used as a source of carbon and energy, providing as much as 8–10% of human nutrition. Colonic bacteria also ferment host-produced polysaccharides, such as mucopolysaccharides and mucins.
Figure 5-17. A schematic view of activities of the colonic microbiota. In the small intestine, concentrations of bacteria are low due to the fast flow of contents and human intestinal enzymes that mediate most of the digestion. In the colon, concentrations of bacteria are so high that bacteria account for about 30% of the volume or contents. In this site, the colonic bacteria ferment polysaccharides from the human diet (plant polysaccharides or dietary fiber), host-derived polysaccharides (mucins and mucopolysaccharides), and diet-derived proteins (those not digested by pancreatic enzymes in the small intestine). The resulting short-chain fatty acids (SCFAs; 85–95% of SCFAs produced in the colon are acetic acid, propionic acid, and butyric acid) and amino acids are absorbed by the human body and used as sources of carbon and energy.
Colonic bacteria not only break down food material into simple sugars, amino acids, and lipids (catabolism), but also synthesize other nutrients and vitamins (anabolism) that our bodies need. Indeed, emerging experimental evidence points to a number of important functional roles that microbiota play in providing metabolites and host-microbe interactions that stimulate multiple endocrine, neurocrine, and immunologic signals to the brain and vice versa. These bidirectional interactions have been dubbed the microbiota-gut-brain axis. Recent studies have shown that modulating the gut microbiota impacts emotional behavior, including anxiety, depression, and pain, supporting the therapeutic potential of appropriate probiotics, prebiotics, and diet as interventions for psychiatric and neuroimmune disorders.
The colon is an anoxic environment, so it is not surprising that the numerically predominant colonic bacteria are obligate anaerobes. The numerically predominant anaerobes include the phyla Bacteroidetes (Gram-negative Bacteroides species), Firmicutes (Gram-positive bacteria, including Lactobacillus and Clostridium species), and Actinobacteria (Corynebacterium and Mycobacterium species). Facultative bacteria such as E. coli and Enterococcus species are also present in lower numbers. Bacteroides species are the ones that are thought to play a major role in fermenting the dietary polysaccharides that our bodies cannot digest. Sequence analyses of the gut bacterial communities have shown that obese individuals have fewer Bacteroidetes (5%) and more Firmicutes (85%) than lean individuals (25% Bacteroidetes, 75% Firmicutes). Understanding colonic fermentation, together with 16S rRNA gene analyses of the microbiota of the colons of obese humans or mice compared to those of nonobese individuals, has fueled the notion that obesity might be caused in part by the composition of the colonic microbiota (Box 5-2), which could be treated through manipulation of the microbial contents of the gut.
Box 5-2.
We Are What We Eat, or Rather What Our Microbiota Eats
Conventional wisdom has it that obesity is a result of genetics, lack of exercise, or a poor diet. But what if your intestinal bacterial population also contributes? An early study that used 16S rRNA gene profiling found that the microbiota of obese mice and humans differed from that of lean mice and humans. Moreover, when germfree mice (mice lacking any intestinal bacteria) were colonized with an “obese microbiota,” those mice gained more fat than germfree mice colonized with a “lean microbiota.” The main difference was the ratio of the two numerically predominant bacteria, the Bacteroidetes (Bacteroides species) and the Firmicutes (Gram-positive obligate anaerobes), in which a higher proportion of Bacteroidetes was associated with leanness.
The hypothesis that the composition of the colonic microbiota is associated with obesity is, as you might imagine, quite controversial, especially among those committed to theories that give precedence to exercise or diet. Also, association of obesity with a more active colonic fermentation seems to run counter to the belief that high-fiber diets would be associated with increased colonic fermentation due to the fact that fiber is primarily composed of polysaccharides that are fermentable by colonic bacteria. The efficiency of the fermentation may be a factor. If so, the prediction from the obesity studies would be that the Firmicutes are more efficient fermenters than the Bacteroidetes. Since virtually nothing is known about the Gram-positive anaerobes and their carbon sources, this is difficult to assess. Another possibility is that some fermenters take a lower energy toll in the form of stimulating mucosal cell turnover.
A good feature of the hypothesis regarding a connection between obesity and the microbiota composition is that it may prompt more studies of the metabolic activities of Gram-positive anaerobes. Moreover, it illustrates the fact that the 16S rRNA gene approach, and even the metagenomics approach, may be a good start for addressing these questions, but that work on better understanding bacterial physiology will be critical.
The idea that we might be able to combat obesity by manipulating our microbiota has received some strong experimental support. To study the impact of microbiota composition on obesity, germfree mice were inoculated with microbiota from obese or lean human twins fed a low-fat, high-fiber diet. As illustrated in panel A of the figure mice that received a microbiota from the obese twin (red) became obese, while mice that received a microbiota from the lean twin (blue) did not become obese. When the two groups of mice were housed together and fed a low-fat, high-fiber diet (as illustrated in panel B of the figure), transmission of microbes (mostly Bacteroides species) from the mice with a lean-promoting microbiota occurred to the mice with an obesity-promoting microbiota, such that none of the mice became obese. On the other hand, when the two groups were fed a high-fat, low-fiber diet, no transmission occurred and the mice with an obesity-promoting microbiota became obese.
Sources:
Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. 2006. An obesity-associated gut microbiota with increased capacity for energy harvest. Nature 444:1027–1031.
Ridaura VK, Faith JJ, Rey FE, Cheng J, Duncan AE, Kau AL, Griffin NW, Lombard V, Henrissat B, Bain JR, Muehlbauer MJ, Ilkayeva O, Semenkovich CF, Funai K, Hayashi DK, Lyle BJ, Martini MC, Ursell LK, Clemente JC, Van Treuren W, Walters WA, Knight R, Newgard CB, Heath AC, Gordon JI. 2013. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science 341:1079.
The products of fermentation (acetate, CO2, and H2) are also used as carbon and energy sources by minor populations of methanogenic archaea. These methane producers convert H2 and CO2 to methane, which is absorbed, though not used, by the human body and is expelled in breath and flatus. About one-fifth of people tested have enough methane produced in their colons to be easily detectable in their breath by gas-liquid chromatography. Sulfate-reducing bacteria also reside in the colon, using H2 to help convert sulfate to HS. The main source of the sulfate that they reduce is not clear, but some of it probably comes as a byproduct of fermentation of human-produced sulfated glycoproteins and sulfated polysaccharides, such as mucopolysaccharides and mucin. Sulfides produced by the sulfate reducers are responsible in part for the odor of feces and are thought to contribute as a risk factor for colon cancer.
In judging the energy balance, it is worth realizing that intestinal microbes also take a toll from us; they stimulate the immune system and the turnover of intestinal mucosal cells. The constant sloughing of intestinal mucosal cells is a very effective defense that prevents bacteria that have attached to the mucosal cells from staying in the site long enough to invade. Similarly, the intestinal immune system is an important defense. But these activities require an output of proteins and energy by the human body. On balance, however, the energy balance seems to go in our favor, unless you count the effort we put into cultivating and obtaining the foods we eat that ultimately feed the colonic bacteria.
Many colonic bacteria, such as Bacteroides species and numerically minor populations like E. coli, Enterococcus, and some Clostridium species, are capable of causing serious infections if they escape from the colon as a result of surgery or some other trauma and get into the bloodstream and tissues. How could obligate anaerobes like Bacteroides cause infection in the human body, which would seem to be an aerobic environment that could kill them? Bacteroides prefer to lodge in regions of prior tissue damage. Disruption of the blood supply to such areas causes them to become anoxic, and thus fertile ground for an anaerobic infection. Moreover, blood itself is actually a hypoxic environment that is low in free oxygen. These normal colonic inhabitants are increasingly becoming resistant to many antibiotics.
Bacteria in the intestine interact with each other metabolically in the sense that methanogens and sulfate reducers use the end products of the polysaccharide fermenters, but they also interact with each other genetically by exchanging DNA. There is an old idea called the reservoir hypothesis that frames this interaction in terms of the transfer of antibiotic resistance and virulence genes (more on this in chapter 7). Briefly, colonic bacteria exchange DNA with each other. They may also exchange DNA with swallowed bacteria that are present only transiently in the colon as they pass through and are expelled into the environment. In this view, colonic bacteria act as reservoirs of antibiotic resistance and virulence genes in the sense that they are present in high numbers in an area in which other bacteria are transiently present for 24–48 hours, which is more than enough time for DNA transfers to occur. Only recently has it been possible to test this hypothesis by using molecular sequencing methods to follow the host-to-host transmission and movement of particular genes through and within the human colon. In fact, evidence is mounting that gene transfers via conjugative elements like plasmids and conjugative transposons occur frequently in the colon. These transfers occur between different species and genera, including between colonic bacteria and bacteria from different sites.
The reservoir hypothesis has loomed large in the debate over possible adverse consequences of the overuse of antibiotics, particularly on the farm. Antibiotic-resistant bacteria arise on the farm due to selection by the use of antibiotics as feed additives or to prevent infections in crowded populations of animals. The problem is that these antibiotic-resistant bacteria are already moving through the food supply and into the human intestinal tract, where the resistance genes could be transferred to bacteria permanently or temporarily residing in the human colon. These concerns regarding the increased spread of antibiotic resistance are compounded by the recent realization of the importance of a normal microbiota in maintaining health and by the fact that antibiotic treatment leads to loss of not just the disease-causing pathogen, but also the beneficial bacteria that maintain a healthy microbiota. Perturbation of the normal healthy microbiota (dysbiosis) has been correlated with inflammatory and autoimmune diseases, such as inflammatory bowel disease (IBD) and Crohn’s disease, and with increased susceptibility to other types of infections by opportunistic pathogens such as C. difficile.
Microbiota of the Vaginal Tract
The microbiota of the female vaginal tract has already been introduced in the example about the application of DNA-based analysis of complex microbial populations, but that description did not explore the special features of the site that presumably explain the composition of the microbiota of the vagina. The vagina is a complex site. The vaginal tract experiences considerable periodic hormonal changes associated with the menstrual cycle. Although mucin secretions are constantly bathing the vaginal mucosa, the flow of fluids seen is not nearly as fast as that seen in the intestinal tract, except during menstruation. Most of the time, the bacteria are loosely or strongly associated with the vaginal mucosa.
The microbiota of the vagina of most healthy humans consists mainly of Gram-positive Lactobacillus species. These lactobacilli are fermentative bacteria that produce mainly lactic acid and probably contribute to the normally low pH (less than 5) of the vaginal tract, which generally acts as a powerful protective barrier against colonization by many disease-causing bacteria. Unfortunately, many other pathogenic bacteria can survive and multiply at pH 5. Another possible contribution of lactobacilli to protecting the vaginal tract against disease-causing bacteria is that some of them produce hydrogen peroxide, which is toxic to many microbes, not just bacteria. There is no question that women who take antibiotics that kill or inhibit the growth of lactobacilli often develop yeast infections (vaginitis) or bacterial dysbiosis (vaginosis). The availability of new molecular techniques has now allowed in-depth analysis of the vaginal microbiota and interrogation of their roles in vaginal health and reproduction.
BV was once considered to be a minor disease, characterized by slight discomfort, mild inflammation, a fishy odor, and an unhealthy vaginal microbiota that lacked lactobacilli. More recently, BV has been shown to be associated with a number of disorders, including increased risk of premature labor and birth, although the mechanism of this connection is still unclear, as well as increased susceptibility to infection with sexually transmitted pathogens, such as HIV, Neisseria gonorrhoeae, or Chlamydia trachomatis.
In contrast to diseases caused by a single pathogen, BV arises from a shift in the microbiota from a predominantly lactobacilli population to a predominantly Gram-negative one, principally G. vaginalis. In fact, a woman with a significant concentration of Gram-negative bacteria like G. vaginalis had previously been assumed by physicians to be exhibiting disease. As mentioned before, DNA-based analyses of a large number of apparently healthy women revealed a surprising finding: many of them carried high concentrations of G. vaginalis (see Figure 5-18). And, interestingly, while about 70% of the healthy women had predominantly lactobacilli, about 30% had very few or no lactobacilli.
Figure 5-18. Temporal and interindividual dynamics of vaginal bacterial communities. Shown are the 16S rRNA gene-based bacterial community composition profiles of eight different healthy women sampled twice weekly over the course of 16 weeks. Colors indicate the relative abundance of each taxonomical group (phylotype) present in each sample. Red bars along the bottom of the plot indicate dates of menstruation. Reprinted from Gajer P, Brotman RM, Bai G, Sakamoto J, Schütte UM, Zhong X, Koenig SS, Fu L, Ma ZS, Zhou X, Abdo Z, Forney LJ, Ravel J. 2012. Sci Transl Med 4(132):132ra52, with permission.
More detailed metagenomic and multi-omic analyses have already demonstrated not just considerable person-to-person variation but variations within a person over the course of the menstrual cycle (Figure 5-18), as well as among different sites in the vaginal tract and with other factors such as hormonal status, sexual activity, age, and pregnancy. Clearly, the microbiota of the vaginal tract is turning out to have as complex a population as that of the other body sites. And, once again, an in-depth discussion of these fascinating topics is beyond the scope of this textbook, so we have included a few suggested readings at the end of this chapter.
The Other Microbiota: The Forgotten Eukaryotes
The content of this chapter has so far focused largely on bacteria (with a brief foray into the archaea of the gut), so it is appropriate to end with a brief description of microbes that have been largely ignored in most studies, but which nevertheless have an impact on the bacterial communities and our immune system: the eukaryotic microbes and viruses. This is particularly true of recent examinations of the microbiotas of the human colon, mouth, and vagina, but is also relevant to the microbiotas of other areas.
From recent metagenomic studies, fungi and bacteriophage play a significant role in the human health, but the DNA-based analyses of these components of the oral, vaginal, and colonic microbiota (the mycobiome and virome, respectively) have been relatively understudied. There are several challenges that have hampered progress in this area. Fungi are normally found as minor components of the healthy microbiota (overgrowth is usually associated with disease). Many fungi are difficult to cultivate or are uncultivable. It is also often difficult to lyse and extract genomic DNA from many fungi. To capture the mycobiome, recent metagenomic sequencing efforts have used the internal transcribed spacer (ITS) region of the rRNA genes as a basis for comparative sequencing analysis, analogous to the 16S rRNA sequencing analysis in bacteria.
Like the mycobiome, the virome (viruses and phage) has also been understudied. Again, there are a number of key challenges that need to be overcome for this field to move forward. Despite the fact that viruses are relatively abundant in numbers, they only make up a small portion of the total DNA or RNA in a sample. Virions are difficult to isolate, purify, and study, particularly in terms of finding suitable host cells for viral propagation. From those sequencing studies that have been undertaken, it appears that the bulk of the sequences are completely novel, with few, if any, homologs in the databases. This makes sequence annotation and comparative genomic analyses difficult to perform and, more significantly, it makes viral classification, evolution, and characterization within complex mixtures a daunting task. Nevertheless, many valiant efforts are currently underway.
In developing countries especially, there is another important eukaryotic component of the colonic microbiota that consists of protozoa and helminths (e.g., tapeworms). We tend to think of protozoa and helminths as pathogens, but this picture may not be entirely correct. Many people who carry these microbes are not sick. In fact, the eukaryotic component of the microbiota has been a fact of human life for millions of years. In areas of the world where such eukaryotic microbes are endemic, a majority of the population maintains them without any adverse effects. Only during the past century, and only in more developed parts of the world, has the eukaryotic component of the microbiota been virtually eliminated due to clean water, better hygiene, and a high-quality food supply.
There is some evidence that the abrupt (in evolutionary terms) loss of the eukaryotic component of the microbiota by people who live in developed countries may have contributed to a number of adverse effects, such as allergies and IBD. One current hypothesis is that helminths, and possibly protozoa as well, stimulate the arm of the immune system that consists of IgE, eosinophils, mast cells, and other cell types that are negatively associated with allergies and IBD. Support for this hypothesis arises from the fact that early stimulation of the gastrointestinal immune system by eukaryotic microbes allows this part of the immune response to develop normally. Conversely, failure to experience this type of stimulation may, in some people, predispose them to disease caused by overstimulation of the inflammatory response. The impact of the loss of these eukaryotic microbes on the bacterial microbiota has not yet been explored thoroughly, but efforts are now moving in this direction.
SELECTED READINGS
Arnold JW, Roach J, Azcarate-Peril MA. 2016. Emerging technologies for gut microbiome research. Trends Microbiol 24:887–901.[PubMed][CrossRef]
Bäckhed F, Ley RE, Sonnenburg JL, Peterson DA, Gordon JI. 2005. Host-bacterial mutualism in the human intestine. Science 307:1915–1920.[PubMed][CrossRef]
Caulfield T, Evans J, McGuire A, McCabe C, Bubela T, Cook-Deegan R, Fishman J, Hogarth S, Miller FA, Ravitsky V, Biesecker B, Borry P, Cho MK, Carroll JC, Etchegary H, Joly Y, Kato K, Lee SS, Rothenberg K, Sankar P, Szego MJ, Ossorio P, Pullman D, Rousseau F, Ungar WJ, Wilson B. 2013. Reflections on the cost of “low-cost” whole genome sequencing: framing the health policy debate. PLoS Biol 11:e1001699.[PubMed][CrossRef]
Chen Y, Zhang W, Knabel SJ. 2005. Multi-virulence-locus sequence typing clarifies epidemiology of recent listeriosis outbreaks in the United States. J Clin Microbiol 43:5291–5294.[PubMed][CrossRef]
Deleo FR, Chen L, Porcella SF, Martens CA, Kobayashi SD, Porter AR, Chavda KD, Jacobs MR, Mathema B, Olsen RJ, Bonomo RA, Musser JM, Kreiswirth BN. 2014. Molecular dissection of the evolution of carbapenem-resistant multilocus sequence type 258 Klebsiella pneumoniae. Proc Natl Acad Sci USA 111:4988–4993.[PubMed][CrossRef]
Franzosa EA, Hsu T, Sirota-Madi A, Shafquat A, Abu-Ali G, Morgan XC, Huttenhower C. 2015. Sequencing and beyond: integrating molecular ‘omics’ for microbial community profiling. Nat Rev Microbiol 13:360–372.[PubMed][CrossRef]
Franzosa EA, Morgan XC, Segata N, Waldron L, Reyes J, Earl AM, Giannoukos G, Boylan MR, Ciulla D, Gevers D, Izard J, Garrett WS, Chan AT, Huttenhower C. 2014. Relating the metatranscriptome and metagenome of the human gut. Proc Natl Acad Sci USA 111:E2329–E2338.[PubMed][CrossRef]
Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE. 2006. Metagenomic analysis of the human distal gut microbiome. Science 312:1355–1359.[PubMed][CrossRef]
Greaves J, Roboz J. 2014. Mass Spectrometry for the Novice. CRC Press, Boca Raton, FL.
Holt KE, Baker S, Weill FX, Holmes EC, Kitchen A, Yu J, Sangal V, Brown DJ, Coia JE, Kim DW, Choi SY, Kim SH, da Silveira WD, Pickard DJ, Farrar JJ, Parkhill J, Dougan G, Thomson NR. 2012. Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet 44:1056–1059.[PubMed][CrossRef]
Huttenhower C, et al, Human Microbiome Project Consortium. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486:207–214.[PubMed][CrossRef]
Kong HH, Morris A. 2017. The emerging importance and challenges of the human mycobiome. Virulence 19:1–3.[PubMed][CrossRef]
Ley RE, Peterson DA, Gordon JI. 2006. Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell 124:837–848.[PubMed][CrossRef]
Mayer EA, Knight R, Mazmanian SK, Cryan JF, Tillisch K. 2014. Gut microbes and the brain: paradigm shift in neuroscience. J Neurosci 34:15490–15496.[PubMed][CrossRef]
McClure R, Balasubramanian D, Sun Y, Bobrovskyy M, Sumby P, Genco CA, Vanderpool CK, Tjaden B. 2013. Computational analysis of bacterial RNA-Seq data. Nucleic Acids Res 41:e140.[PubMed][CrossRef]
McDonald D, Ackermann G, Khailova L, Baird C, Heyland D, Kozar R, Lemieux M, Derenski K, King J, Vis-Kampen C, Knight R, Wischmeyer PE. 2016. Extreme dysbiosis of the microbiome in critical illness. MSphere 1:e00199–e16.[PubMed][CrossRef]
Nunn KL, Forney LJ. 2016. Unraveling the dynamics of the human vaginal microbiome. Yale J Biol Med 89:331–337.[PubMed]
Parker MT. 2016. An ecological framework of the human virome provides classification of current knowledge and identifies areas of forthcoming discovery. Yale J Biol Med 89:339–351.[PubMed]
Rawls JF, Mahowald MA, Ley RE, Gordon JI. 2006. Reciprocal gut microbiota transplants from zebrafish and mice to germ-free recipients reveal host habitat selection. Cell 127:423–433.[PubMed][CrossRef]
Reid G, Kim SO, Köhler GA. 2006. Selecting, testing and understanding probiotic microorganisms. FEMS Immunol Med Microbiol 46:149–157.[PubMed][CrossRef]
Rhoads A, Au KF. 2015. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 13:278–289.[PubMed][CrossRef]
Salyers AA, Gupta A, Wang Y. 2004. Human intestinal bacteria as reservoirs for antibiotic resistance genes. Trends Microbiol 12:412–416.[PubMed][CrossRef]
Samuel BS, Gordon JI. 2006. A humanized gnotobiotic mouse model of host-archaeal-bacterial mutualism. Proc Natl Acad Sci USA 103:10011–10016.[PubMed][CrossRef]
Stumpf RM, Wilson BA, Rivera A, Yildirim S, Yeoman CJ, Polk JD, White BA, Leigh SR. 2013. The primate vaginal microbiome: comparative context and implications for human health and disease. Am J Phys Anthropol 152(Suppl 57):119–134.[PubMed][CrossRef]
Ursell LK, Metcalf JL, Parfrey LW, Knight R. 2012. Defining the human microbiome. Nutr Rev 70(Suppl 1):S38–S44.[PubMed][CrossRef]
Wischmeyer PE, McDonald D, Knight R. 2016. Role of the microbiome, probiotics, and ‘dysbiosis therapy’ in critical illness. Curr Opin Crit Care 22:347–353.[PubMed][CrossRef]
Yarza P, Yilmaz P, Pruesse E, Glöckner FO, Ludwig W, Schleifer K-H, Whitman WB, Euzéby J, Amann R, Rosselló-Móra R. 2014. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol 12:635–645.[PubMed][CrossRef]
Yeoman CJ, Thomas SM, Miller ME, Ulanov AV, Torralba M, Lucas S, Gillis M, Cregger M, Gomez A, Ho M, Leigh SR, Stumpf R, Creedon DJ, Smith MA, Weisbaum JS, Nelson KE, Wilson BA, White BA. 2013. A multi-omic systems-based approach reveals metabolic markers of bacterial vaginosis and insight into the disease. PLoS One 8:e56111.[PubMed][CrossRef]
Yildirim S, Yeoman CJ, Janga SC, Thomas SM, Ho M, Leigh SR, White BA, Wilson BA, Stumpf RM, Primate Microbiome Consortium. 2014. Primate vaginal microbiomes exhibit species specificity without universal Lactobacillus dominance. ISME J 8:2431–2444.[PubMed][CrossRef]
Questions
1. How does a DNA-based analysis, such as the 16S rRNA gene analysis, differ from a cultivation-based analysis? In this chapter, the DNA-based approaches have been emphasized. What are some advantages of the cultivation-based approach?
2. More and more research groups are seeking to show that changes in the microbiota of a particular site are involved in diseases such as periodontal disease, inflammatory bowel disease, and premature birth. Critics object that showing an association is not the same as demonstrating cause and effect. In the case of the obesity study, scientists tried to do this by inoculating germfree mice with different variations of the microbiota. Clearly, this would not be possible in humans. How might you prove cause and effect in humans?
3. Infants in the first years of life are often more susceptible to certain bacterial infections than older children. How can you explain this? What function of the microbiota does this illustrate?
4. Members of the microbiota cause some quite serious diseases. How could a bacterium that normally lives in a beneficial or neutral association with its human host cause serious disease?
5. Metabolic interactions between members of the microbiota are attracting more attention because two microbes working together can increase the effectiveness of a reaction catalyzed by one of them. Consider an association between a polysaccharide-fermenting microbe and a methanogen in the colon. Consider also that the overall energy of a reaction depends on the ratio of substrate to end products for a bacterium like a polysaccharide fermenter. Can you explain why a polysaccharide fermenter and a methanogen might team up in the colon?
6. The assertion is made in this chapter that scientists now believe that transfer of DNA by conjugation in the colon is occurring across species and genus lines. Suppose you found the same type of antibiotic-resistance gene in members of two different genera. What criteria might you use to show that the gene was transferred horizontally? How might you suspect that the gene was transferred by conjugation?
7. Conventional wisdom asserts that there are no methanogens in the vaginal tract. If they were present they would probably be present at low levels. How would you use the 16S rRNA approach to find them? What modification of the approach used to find bacterial sequences would you have to make?
8. PCR combined with sequencing can provide a quick identification of bacteria. What are the limitations of this approach?
9. Resident microbiotas provide protection from colonization of some pathogenic bacteria in certain parts of the body. Describe regions of the body where normal microbiotas are protective and how they accomplish this protection.
For each of the following, choose the most correct answer:
10. The “normal” microbiota of humans is
a. constantly changing.
b. commensal.
c. the same in every individual.
d. parasitic.
11. Removing most of the microorganisms from the intestinal tract would
a. inhibit infections.
b. have no effect on health.
c. lead to diarrhea and maybe other diseases.
d. improve digestion.
12. Which of the following statements about normal microbiotas is not true?
a. Our bodies begin to acquire a normal microbiota within a week of birth.
b. Microbial communities, once established, remain constant throughout one’s lifetime.
c. At the cellular level, our bodies are approximately 50% bacteria.
d. Normal microbiotas can cause disease in humans.
13. Clostridium difficile is an example of
a. how antibiotic use can cause a disease by disturbing the gut microbiota.
b. how a shift in the bacterial microbiota of the gums can cause disease.
c. how a shift in the eukaryotic microbiota of the vaginal tract can cause disease.
d. a bacterium that is capable of surviving in the human stomach.
e. a bacterium that is normally carried in the human nose.
14. Helicobacter pylori is
a. a major component of the microbiota of the human small intestine.
b. a cause of ulcers in the human stomach.
c. responsible for most of the lactic acid production in the stomach.
d. allows yeast to overgrow after antibiotic treatment.
e. associated with gum disease and bad breath when a microbiota shift occurs.
15. The reservoir hypothesis refers to the
a. effect of antibiotics on the composition of the oral microbiota.
b. transfer of genes among bacteria in the human colon.
c. shifts in bacterial populations that are associated with diet.
d. differences in the prokaryotic-to-eukaryotic ratio.
e. presence of Gram-positive bacteria in the vaginal tract.
16. A normal microbiota in the duodenum is
a. similar to the microbiota in the stomach.
b. similar to the microbiota in the colon.
c. tolerant to acidic environments.
d. both A and C.
e. both B and C.
17. In the absence of the full complement of the normal microbiota, due to orally taken antibiotics, opportunistic microorganisms such as _______ can become established and cause disease.
a. Clostridium difficile
b. Proteus mirabilis
c. Staphylococcus aureus
d. all of the above
e. none of the above
Solving Problems in Bacterial Pathogenesis
1. You are a researcher working for the U.S. Department of Agriculture. After two years of effort, you have isolated in pure culture a new, highly virulent bacterium from duck feces that is responsible for several major outbreaks of deaths in mammalian wildlife from contaminated pond water in the South. Based on 16S rRNA sequence comparison, you have determined that this new bacterium is distantly related to the Gram-negative bacterium Vibrio cholerae, and you have named the new strain Vibrio birdsii. You find that ducks are apparently unaffected by V. birdsii. You suspect that V. birdsii may be part of the normal microbiota of ducks. To test this hypothesis, you set up an experiment to examine the host response to V. birdsii in germfree ducks. The results are summarized in the graph shown in Figure 1. Provide a detailed explanation and interpretation for the results. Do the results support the hypothesis? Provide a rationale for your answer.
Figure 1. Results.
2. Dental plaque is a biofilm consisting of a complex community of over 700 different bacterial species. Epidemiological evidence suggests that a population shift toward certain Gram-negative anaerobes is responsible for initiation and progression of periodontal diseases. Tannerella forsythia, a Gram-negative, filamentous, nonmotile, anaerobic bacterium, is also considered one of the pathogens implicated in contributing to advanced forms of periodontal disease in humans and is strongly associated with cases of severe periodontitis. It is found co-aggregated in periodontal pockets with other putative periodontal pathogens, such as Porphyromonas gingivalis and Fusobacterium nucleatum. Infection with T. forsythia induces alveolar bone resorption in a mouse infection model, in which the bacterium is inoculated under the gums of mice, followed by measuring for loss of dental bone. Considering that periodontal disease might be a community shift-type disease with multiple microbial participants, how might you use microbial community profiling methods to demonstrate the importance of the microbial community composition in contributing to the onset and maintenance of the diseased state? Set up the experiment first without using DNA sequencing approaches and then with using DNA sequencing approaches. Be sure to provide your rationale for method choice. From your results, how could you distinguish between a model of disease caused by a microbial community shift involving multiple microbes and one involving a single pathogen, such as T. forsythia or P. gingivalis, or a combination of both T. forsythia and P. gingivalis?