Читать книгу Cell Biology - Stephen R. Bolsover - Страница 109
IN DEPTH 4.2 GENOME PROJECTS
ОглавлениеThe publication in 1996 of the sequence of the genome of the single‐celled yeast S. cerevisiae was a milestone in biology. Not only did scientists have before them the complete genetic blueprint of a eukaryotic organism, but the technology for obtaining and curating huge amounts of genetic data was established. The genomes of other simple organisms such as the tiny nematode worm Caenorhabditis elegans, with just 959 body cells, and the fruit fly D. melanogaster, were published soon after, followed by more complex organisms such as the mouse and, of course, humans. Today, the sequence of the genomes of nearly 60 000 organisms, including 15 000 eukaryotic species, has been determined. Genomes from every branch of the tree of life are now available for study, including the platypus, our most distant mammalian relative, and both the nuclear and mitochondrial genomes of the Neanderthal, the hominid most closely related to present‐day humans.
Sophisticated databases have been created to store and analyze base sequence information from the various genome projects. Computer programs analyze the data for exon sequences and compare the sequence of one genome to that of another. In this way sequences encoding related proteins (proteins that share stretches of similar amino acids) can be identified. The genome data from patients can be used to identify mutations and inform clinical decision‐making. Some important programs that can be easily accessed through the internet are BLASTN for the comparison of a nucleotide sequence to other sequences stored in a nucleotide database and BLASTP, which compares an amino acid sequence to protein sequence databases. Programs such as Clustal, MAFFT, MUSCLE, and T‐Coffee can be used to compare multiple DNA or multiple protein sequences simultaneously. 3D‐Coffee is a version of T‐Coffee that can combine data from sequence and protein structure databases in the analysis.
The Human Genome Project, completed in 2003, was a 13‐year international effort that was described at the time as the biological equivalent of putting a man on the moon. As more and more genomes were sequenced, the technology became quicker and, more importantly, cheaper. Using Next Generation Sequencing (NGS) technologies, it is currently possible to have our genomes sequenced at a cost that ranges between a few hundred and a thousand dollars per person. As an increasing number of us have our genomes sequenced, this inexpensive but informative resource is bringing personalized medicine closer to our everyday lives. Soon, clinicians will routinely tailor treatment for a wide range of diseases to our own unique genetic makeup.
In 2012 the 100 000 genomes project was set in motion through a collaboration between scientists and the government of the United Kingdom. Under the direction of Genomics England, the remit was to sequence 100 000 complete genomes from NHS (National Health Service) patients. The aim of this large‐scale project was to analyze DNA from patients with cancer or who had a rare disorder to try to provide an understanding of the causes of a condition and inform best treatments. In a cancer patient the genome sequence from both tumor and normal tissue was compared. For patients with a rare disease, the genomes of two relatives were also sequenced. In December 2018 the project met its target of sequencing 100 000 genomes, a remarkable achievement of progress in DNA sequence technology and analysis. To date, the project has generated over 21 petabytes of genome data and is already delivering valuable insights into how DNA sequences inform an individual's medical condition. In response to the COVID‐19 pandemic, genome sequencing of both patients and SARS‐CoV‐2 samples has provided us with information on both the way in which an individual's genome influences their susceptibility to COVID‐19 infection and on the spread of new variants through the population.
Answer to thought question: Guanine cannot base pair with uracil, so there is certainly a mismatch in the DNA. However, mismatch repair cannot correct the error, because the mutation has occurred in a mature chromosome in which both DNA strands are methylated.
When the bacterium replicates its DNA, the strands will separate and each will act as a template for the synthesis of a new strand. The unmodified strand, 5′ TGAA 3′ will have the matching strand 3′ ACTT 5′ synthesized on it, and the resulting newly synthesized strand will be unmutated, as will its own daughter strands.
However, the modified strand 3′ AUTT 5′ will have the matching strand 5′ TAAA 3′ synthesized on it, since adenine base pairs with uracil. The daughter cell that inherits this chromosome, assuming that it is still infected with PBS2 and therefore allows the uracil to remain, will now have a chromosome with the structure
with each base pair now nicely hydrogen bonded to its partner. Only the presence of uracil in the DNA molecule betrays the fact that a deamination event has occurred.
When this cell replicates its DNA the strands will separate and each will act as a template for the synthesis of a new strand. The lower strand, 3′ AUTT 5′ will as before have the matching strand 5′ TAAA 3′ synthesized on it. The upper strand 5′ TAAA 3′ will generate a chromosome with the structure
and the daughter cell that inherits this, and all its daughters in turn, will have a mutation that can no longer be easily identified as arising from a deamination.
This answer neglects the fact that a bacterium infected by a bacteriophage will likely die without generating any daughters.