Читать книгу Genetic Analysis of Complex Disease - Группа авторов - Страница 12

Introduction

Disease gene discovery in humans has a long history, predating even the identification of DNA as the genetic molecule (Watson and Crick 1953) and the determination of the number of human chromosomes (Ford and Hamerton 1956; Tjio and Levan 1956). In fact, as early as the 1930s some simple statistical methods for the analysis of genetic data had been developed (Bernstein 1931; Fisher 1935a,b). However, these methods were severely limited in their application (more on basic concepts of genetics in Chapter 2). Not only were genetic markers lacking (the ABO blood type was one of the few that had been described), but these methods were restricted to small, two to three generation pedigrees. Any calculations were performed by hand, of course, making analysis laborious.

There were two hurdles to overcome before human disease gene discovery would become routine. First, appropriate statistical methods were lacking, as were ways of automating the calculations. Second, sufficient genetic markers to cover the human genome needed to be identified. Morton (1955), building on the work of Haldane and Smith (1947) and Wald (1947), described the use of maximum likelihood approaches in a sequential test for linkage between two loci. He used the term “LOD score” (for logarithm of the odds of linkage) for his test. This score is the basis for most modern genetic linkage analyses and represents a milestone in human disease gene discovery. However, the complex calculations had to be done by hand, severely limiting the use of this approach. Elston and Stewart (1971) described a general approach for calculating the likelihood of any non‐consanguineous pedigree. This algorithm was extended by Lange and Elston (1975) to include pedigrees of arbitrary complexity. Soon thereafter, the first general‐purpose computer program for linkage in humans, LIPED (Ott 1974), was described. Thus, the first of the two major hurdles was overcome.

By the mid‐1970s there were 40–50 red cell antigen and serum protein polymorphisms available as genetic markers. A few markers could be arranged into initial linkage groups, but these markers covered only approximately 5–15% of the human genome. In addition to this limited coverage, genotyping these polymorphisms was labor intensive, time consuming, and often quite technically demanding. This remaining hurdle was crossed with the description of restriction fragment length polymorphisms (RFLPs) by Botstein et al. (1980). Not only were these markers easier to genotype in a standard manner, but they were frequent in the genome, covering the remaining 85–95% of the genome for the first time.

With these tools in place, the field of human disease gene discovery blossomed. The first successful disease gene linkage using RFLPs was reported (Gusella et al. 1983), localizing the Huntington disease gene to chromosome 4p. This discovery marked the beginning of disease gene identification through the positional cloning approach. Early successes using positional cloning were for diseases inherited in Mendelian fashion: autosomal dominant, autosomal recessive, or X‐linked. Although confounding factors such as genetic heterogeneity, variable penetrance, and phenocopies might exist for single‐gene or Mendelian traits, it is generally possible with a known genetic model to determine the best and most efficient approach to identifying the responsible gene. The success of these tools is apparent since by mid‐2017 over 3350 single‐gene disorders had at least one causative genetic variant identified (OMIM, accessed May 2017 at http://omim.org).

However, the inheritance patterns for traits such as the common form of Alzheimer’s disease, multiple sclerosis, and non‐insulin‐dependent diabetes (to name a few) do not fit any simple genetic explanation, making it far more difficult to determine the best approach to identifying the unknown underlying effect. In addition to the confounding factors involved in single‐gene disorders, such as genetic heterogeneity and phenocopies, gene–gene and gene–environment interactions must be considered when a complex trait is dissected. However, the tools that enabled efficient mapping of Mendelian trait loci through positional cloning were not as effective in dissecting these more complex traits. New statistical tools, study designs, and genotyping technologies were needed to perform large‐scale analysis of genetic factors underlying these complex traits. As these technologies were developed, a new approach to complex disease gene identification via genome‐wide association studies (GWAS) was enabled. The shift to this approach was predicted by a seminal perspective published by Risch and Merikangas (1996), in which they showed that large‐scale case–control analyses of complex traits would be a powerful and efficient method of identifying alleles underlying complex traits, once genotyping technology allowed the cost‐effective determination of a dense map of genetic markers. The first GWAS was published in 2005 (Klein et al. 2005), identifying the association of variation in the CFH gene with age‐related macular degeneration. This was simultaneously confirmed using alternate study designs (Edwards et al. 2005; Haines et al. 2005) proving that GWAS worked, allowing this new era of complex disease genetics to begin in earnest.

With the dawn of the GWAS era, a corresponding shift in the prevailing hypotheses for these studies occurred. No longer were studies solely searching for one or a few rare mutations in a single gene that cause a rare and devastating disease. Studies of common complex diseases were searching for multiple alterations in one or more genes acting alone or in concert to increase or decrease the risk of developing a trait. Early GWAS tended to test the “common disease‐common variant” (CDCV) hypothesis: the risk for common diseases, across ethnic groups, arises from evolutionarily old variants that have had substantial time to spread throughout the human population. Many studies successfully identified thousands of variants associated with the risk of complex diseases. An interactive catalog of these variants is maintained by the National Human Genome Research Institute and the European Molecular Biology Laboratory at http://www.ebi.ac.uk/gwas. Despite these successes, many studies testing the CDCV hypothesis failed to explain all the heritable variation in the risk of the complex traits under study – a phenomenon termed “missing heritability” (Manolio et al. 2009). One explanation for this was that the effect of rare variants was not well studied by early GWAS – an alternative hypothesis termed the “common disease‐rare variant” (CDRV) hypothesis. This hypothesis suggests that risk of common complex diseases arises from a larger number of rare variants in one or more genes, perhaps occurring more recently.

As was the case with common variants and the exploration of the CDCV hypothesis being enabled by GWAS approaches and high‐throughput genotyping technology, exploration of the CDRV hypothesis was enabled by advances in high‐throughput sequencing technology and accompanying statistical analysis methods. Initial screens of coding‐sequence variants in Mendelian traits via whole‐exome sequencing (WES) were published by Ng et al. (2009, 2010) and Choi et al. (2009), demonstrating that in some cases, disease gene mapping could skip the positional cloning strategy and proceed directly to evaluating segregation of mutations in families. This proof of principle has been used to justify this approach for testing the CDRV hypothesis in complex traits but has been met with mixed success. A successful example is the recent analysis of 50 000 individuals in the MyCode Community Health Initiative successfully identified rare variants underlying cardiovascular traits and lipid levels (Dewey et al. 2016). The rapid and continuing decrease in whole‐genome sequencing (WGS) costs suggests that within a few years, it will be possible (and perhaps commonplace) to test the CDRV hypothesis using WGS in large sample sizes – essentially performing genome‐wide association for common and rare variants with direct genotype determination via sequencing.

Study design, laboratory methods, and analytic approaches differ by trait type (Mendelian or complex) and hypothesis being tested (rare disease‐rare variant, Mendelian positional cloning; CDCV [GWAS]; CDRV [WES or WGS and individual variant or set‐based association]). These approaches are described in the following sections.

Подняться наверх