Читать книгу Population Genetics - Matthew B. Hamilton - Страница 43
Gametic disequilibrium
ОглавлениеWe saw earlier in the chapter that Hardy–Weinberg could be extended to give expected genotype frequencies for two loci using via the product rule. While this is accepted without question now, in the early days of population genetics, it was a challenge to explain. In 1902, Walter Sutton and Theodor Boveri advanced the chromosome theory of heredity. They observed cell division and hypothesized that the discrete bodies seen separating into sets at meiosis and mitosis contained hereditary material that was transmitted from parents to offspring. At the time, the concept of chromosomal inheritance presented a paradox. Mendel's second law says that gamete haplotypes (haploid genotype) should appear in frequencies proportional to the product of allele frequencies. This prediction conflicted with the chromosome theory of heredity since there are not enough chromosomes to represent each hereditary trait.
To see the problem, take the example of Homo sapiens with a current estimate of about 20 000 protein coding genes in the nuclear genome. However, humans have only 23 pairs of chromosomes, or a large number of loci but only a small number of chromosomes. So, if chromosomes are indeed hereditary molecules, many genes must be on the same chromosome (on average about 870 genes per chromosome for humans if there are 20 000 genes). This means that some genes are physically linked by being located on the same chromosome. The solution to the paradox is the process of recombination. Sister chromatids touch at random points during meiosis and exchange short segments, a process known as crossing‐over (Figure 2.18).
Figure 2.18 A schematic diagram of the process of recombination between two loci, A and B. Two double‐stranded chromosomes (drawn in color and gray) exchange strands and form a Holliday structure. The cross over event can resolve into either of two recombinant chromosomes that generate new combinations of alleles at the two loci. The chance of a cross over event occurring generally increases as the distance between loci increases. Two loci are independent when the probability of recombination and non‐recombination are both equal to ½. Gene conversion, a double cross over event without exchange of flanking strands, is not shown.
Linkage of loci has the potential to impact multilocus genotype frequencies and violate Mendel's law of independent segregation, which assumes the absence of linkage. To generalize expectations for genotype frequencies for two (or more) loci requires a model that accounts explicitly for linkage by including the rate of recombination between loci. The effects of linkage and recombination are important determinants of whether or not expected genotype frequencies under independent segregation of two loci (Mendel's second law) are met. Autosomal linkage is the general case that will be used to develop expectations for genotype frequencies under linkage.
The frequency of a two‐locus gamete haplotype will depend on two factors: (i) allele frequencies and (ii) the amount of recombination between the two loci. We can begin to construct a model based on the recombination rate by asking what gametes are generated by the genotype A1A2B1B2. Throughout this section, loci are indicated by the letters, alleles at the loci by the numerical subscripts, and allele frequencies by p1 and p2 for locus A and q1 and q2 for locus B. The problem is easier to conceptualize if we draw the two‐locus genotype as being on two lines akin to chromosomal strands
A1 | B1 |
A 2 | B 2 |
This shows a genotype as two haplotypes and reveals phase or the sets of alleles packaged together on the same chromosomal strand (in contrast to writing the genotype as A1A2B1B2 where phase would be unknown). Given this physical arrangement of the two loci, what are the gametes produced during meiosis with and without recombination events?
A1B1 and A2B2 | “Coupling” gametes: alleles on the same chromosome remain together (a term coined by Bateson and Punnett). |
A1B2 and A2B1 | “Repulsion” gametes: alleles on the same chromosome seem repulsed by each other and pair with alleles on the opposite strand (a term coined by Thomas Morgan Hunt). |
The recombination fraction, symbolized as c (or sometimes r), refers to the total frequency of gametes resulting from recombination events between two loci. Using c to express an arbitrary recombination fraction, let's build an expectation for the frequency of coupling and repulsion gametes. If c is the rate of recombination, then 1 − c is the rate of non‐recombination since the frequency of all gametes is 1, or 100%. Within each of these two categories of gametes (coupling and repulsion), two types of gametes are produced so the frequency of each gamete type is half that of the total frequency for the gamete category. We can also determine the expected frequencies of each gamete under random association of the alleles at the two loci based on Mendel's law of independent segregation.
Gamete | Frequency | ||
Expected | Observed | ||
A1B1 | p 1 q 1 | g11 = (1 − c)/2 | 1 − c is the frequency of all coupling gametes. |
A2B2 | p 2 q 2 | g22 = (1 − c)/2 | |
A1B2 | p 1 q 2 | g12 = c/2 | c is the frequency of all recombinant gametes. |
A2B1 | p 2 q 1 | g21 = c/2 |
The recombination fraction, c, can be thought of as the probability that a recombination event will occur between two loci. With independent assortment, the coupling and repulsion gametes are in equal frequencies and c equals ½ (like the chances of getting heads when flipping a coin). Values of c less than ½ indicate that recombination is less likely than non‐recombination, so coupling gametes are more frequent. Values of c greater than ½ are possible and would indicate that recombinant gametes are more frequent than non‐recombinant gametes (although such a pattern would likely be due to a process such as natural selection eliminating coupling gametes from the population rather than recombination exclusively).
We can utilize observed gamete frequencies to develop a measure of the degree to which alleles are associated within gamete haplotypes. This quantity is called the gametic disequilibrium (or sometimes linkage disequilibrium) parameter and can be expressed by:
where gxy stands for a gamete frequency. D is the difference between the product of the coupling gamete frequencies and the product of the repulsion gamete frequencies. This makes intuitive sense: with independent assortment, the frequencies of the coupling and repulsion gamete types are identical and cancel out to give D = 0, or gametic equilibrium. Another way to think of the gametic disequilibrium parameter is as a measure of the difference between observed and expected gamete frequencies: g11 = p1q1 + D, g22 = p2q2 + D, g12 = p1q2 – D, and g21 = p2q1 – D (note that observed and expected gamete frequencies cannot be negative). In this sense, D measures the deviation of gamete frequencies from what is expected under independent assortment. Since D can be either positive or negative, both coupling and repulsion gametes can be in excess or deficit relative to the expectations of independent assortment.
Different estimators of gametic disequilibrium have different strengths and weaknesses (see Hedrick 1987; Flint‐Garcia et al. 2003). The discussion here will focus on the classical parameter and estimator D to develop the conceptual basis of measuring gametic disequilibrium and to understand the genetic processes that cause it.
Gametic disequilibrium: An excess or deficit or absence of all possible combinations of alleles at a pair of loci in a sample of gametes or haplotypes.
Linkage: Co‐inheritance of loci caused by physical location on the same chromosome.
Recombination fraction: The proportion of “repulsion” or recombinant gametes produced by a double heterozygote genotype each generation.
Now that we have developed an estimator of gametic disequilibrium, it can be used to understand how allelic association at two loci changes over time or its dynamic behavior. If a very large population without natural selection or mutation starts out with some level of gametic disequilibrium, what happens to D over time with recombination? Imagine a population with a given level of gametic disequilibrium at the present time (Dt = n). How much gametic disequilibrium was there a single generation before the present at generation n − 1? Recombination will produce c recombinant gametes each generation so that:
(2.28)
Since gametic disequilibrium decays by a factor of 1 − c each generation,
(2.29)
We can predict the amount of gametic disequilibrium over time by using the amount of disequilibrium initially present (Dt0) and multiplying it by (1 − c) raised to the power of the number of generations that have elapsed:
Figure 2.19 shows the decay of gametic disequilibrium over time using Eq. 2.30. Initially, there are only coupling gametes in the population and no repulsion gametes, giving a maximum amount of gametic disequilibrium. As c increases, the approach to gametic equilibrium (D = 0) is more rapid. Eq. 2.30 and Figure 2.20 both assume that there are no other processes acting to counter the mixing effect of recombination. Therefore, the steady‐state will always be equal frequencies of all gametes (D = 0), with the recombination rate determining how rapidly gametic equilibrium is attained.
A hypothesis test that the observed level of gametic disequilibrium is significantly different than expected under random segregation can be carried out with:
(2.31)
where N is the total sample size of gametes, is a gametic disequilibrium estimate, and p and q are the allele frequencies at two diallelic loci. The χ2 value has 1 degree of freedom and can be compared with the critical value found in Table 2.5.
Figure 2.19 The decay of gametic disequilibrium (D) over time for four recombination rates. Initially, there are only coupling (P11 = P22 = ½) and no repulsion gametes (P12 = P21 = 0). Gametic disequilibrium decays as a function of time and the recombination rate (Dt = n = Dt = 0[1−c]n) assuming a single large population, random mating and no counteracting genetic processes. If all gametes were initially repulsion, gametic disequilibrium would initially equal −0.25 and decay to zero in an identical fashion.
Figure 2.20 A hypothetical partitioning of the contributions to the total population gametic disequilibrium (D) in a population caused by numerous population genetic processes. The finite sample of gametes or genotypes used to measure D can itself contribute to the disequilibrium observed, as can departure from Hardy–Weinberg expected genotype frequencies at single loci or within‐locus disequilibrium. The fractions of the total gametic disequilibrium attributable to each cause will vary depending on history of a population and the relative strengths of the multiple processes acting in a population.
One potential drawback of D in Eq. 2.27 is that its maximum value depends on the allele frequencies in the population. This can make interpreting an estimate of D or comparing estimates of D from different populations problematic. For example, it is possible that two populations have very strong association among alleles within gametes (e.g. no repulsion gametes), but the two populations differ in allele frequency so that the maximum value of D in each population is also different. If all alleles are not at equal frequencies in a population, then the frequencies of the two coupling or the two repulsion gametes are also not equal. When D < 0, Dmax is the value of −p1q1 or − p2q2 that is closer to zero, whereas when D > 0, Dmax is the value of p1q2 or p2q1 that is closer to zero.
A way to avoid these problems is to express D as the percentage of its largest value:
(2.32)
This gives a measure of gametic disequilibrium that is normalized by the maximum or minimum value D can assume given population allele frequencies. Even though a given value of D may seem small in the absolute, it may be large relative to Dmax given the population allele frequencies. A related and more commonly employed measure expresses disequilibrium between two loci as a correlation:
(2.33)
where ρ (pronounced “roe”) takes the familiar and more easily interpreted range of −1 to +1 (the disequilibrium correlation is sometimes given as ρ2 with 0 ≤ ρ2 ≤ 1) (Lewontin 1988). Analogous to the fixation index, the two locus disequilibrium correlation can be understood as a measure of the correlation between the states of the two alleles found together in a two locus haplotypes. When ρ = 0 there is no correlation between the alleles at two loci that are found paired in gametes or on the same chromosome – the allelic states are independent as expected under Mendel's second law. If ρ > 0 there is a positive correlation such that if one of the alleles at one locus is an A, for example, then the allele at the second locus will have a correlated state and might often be a B allele. When ρ < 0 there is a negative correlation between the states of two alleles in a haplotype, such as if A is infrequently paired with B.
Thus far, we have approached gametic disequilibrium by focusing on the frequency of four gamete haplotypes. A helpful complement is to consider the gametes made by all possible two locus genotypes as shown in Table 2.12. This table is somewhat like the table of parental matings and their offspring genotype frequencies we made to prove Hardy–Weinberg for one locus, except Table 2.12 predicts the frequencies of gametes that will make up the next generation rather than genotype frequencies in the next generation. Most genotypes produce recombinant gametes that are identical to non‐recombinant gametes (e.g. the A1B1/A1B2 genotype produces A1B1 and A1B2 coupling gametes and A1B1 and A1B2 repulsion gametes). Only two genotypes – both types of double heterozygotes – will produce recombinant gametes that are different than parental haplotypes. These are the only two places where c enters into the expressions for expected gamete frequencies because recombination does not change the gametes produced by the other eight two locus genotypes.
Table 2.12 Expected frequencies of gametes for two diallelic loci in a randomly mating population with a recombination rate between the two loci of c. The first eight genotypes have non‐recombinant and recombinant gametes that are identical. The last two genotypes produce novel recombinant gametes, requiring inclusion of the recombination rate to predict gamete frequencies. Summing down each column of the table gives the total frequency of each gamete in the next generation.
Parental mating | Expected frequency of mating | Frequency of gametes in next generation | |||
---|---|---|---|---|---|
A1B1 | A2B2 | A1B2 | A2B1 | ||
A1B1/A1B1 | (p1q1)2 | (p1q1)2 | |||
A2B2/ A2B2 | (p2q2)2 | (p2q2)2 | |||
A1B1/ A1B2 | 2(p1q1)(p1q2) | (p1q1)(p1q2) | (p1q1)(p1q2) | ||
A1B1/ A2B1 | 2(p1q1)(p2q1) | (p1q1)(p2q1) | (p1q1)(p2q1) | ||
A2B2/ A1B2 | 2(p2q2)(p1q2) | (p2q2)(p1q2) | (p2q2)(p1q2) | ||
A2B2/ A2B1 | 2(p2q2)(p2q1) | (p2q2)(p2q1) | (p2q2)(p2q1) | ||
A1B2/ A1B2 | (p1q2)2 | (p1q2)2 | |||
A2B1/ A2B1 | (p2q1)2 | (p2q1)2 | |||
A2B2/ A1B1 | 2(p2q2)(p1q1) | (1−c)(p2q2)(p1q1) | (1−c)(p2q2)(p1q1) | c(p2q2)(p1q1) | c(p2q2)(p1q1) |
A1B2/ A2B1 | 2(p1q2)(p2q1) | c(p1q2)(p2q1) | c(p1q2)(p2q1) | (1−c)(p1q2)(p2q1) | (1−c)(p1q2)(p2q1) |
We can relate two locus Hardy–Weinberg expected genotype frequencies to the recombination rate and two locus disequilibrium if we sum the columns to determine the expected gamete frequencies with the possibility of recombination. Focus on the column for the gamete A1B1. Summing the five terms in that column, we get
(2.34)
And expanding the two terms on the right gives
(2.35)
which can be rearranged by noticing the first four terms all contain g11 which can be factored out to give
(2.36)
Recall that D = g11g22 − g12g21 and make the substitution to obtain
(2.37)
Next, notice that (g11 + g22 + g21 + g22) is the sum of all gamete frequencies and equals one. Making that substitution, we obtain
(2.38)
This final result shows that gamete frequencies in the second generation are a function of the gamete frequency we expect from multiplying the respective allele frequencies, increased or decreased by the product of the recombination rate and the amount of two locus disequilibrium. The expected frequency of the A1A1B1B1 genotype, for example, in the next generation is then (g11 ‐ cD)2, and it is not just a function of the product of the allele frequencies but also depends on the recombination rate and the amount of two locus disequilibrium. This is analogous to adjusting single locus H‐W expected genotype frequencies using F to account for one locus disequilibrium.
It is helpful to keep in mind that the term linkage disequilibrium is widely employed in the literature and has deep historic roots (e.g. Lewontin 1964), even though it is an imprecise label that confounds a pattern (two locus haplotypes or genotypes departing from the frequencies expected by the product of frequencies of alleles) and a process. Linkage disequilibrium is a misnomer since physical linkage only dictates the rate at which allelic combinations approach independent assortment or equilibrium. Processes other than linkage are responsible for the production of deviations from independent assortment of alleles at multiple loci in gametes. Using terms like gametic disequilibrium or two‐locus disequilibrium reminds us that the deviation from random association of alleles at two loci is a pattern seen in gametes or haplotypes. Although linkage can certainly contribute to this pattern, so can many other population genetic processes. It is likely that several processes operating simultaneously produce the two‐locus disequilibrium observed in any population, as illustrated by the pie chart in Figure 2.20.
Gametic disequilibrium is a central concept in formulating predictions for multiple locus genotype and haplotype frequencies in populations. Observations of the amount of gametic disequilibrium present in populations can then be used to identify the fundamental population genetic processes operating in populations. Thus, gametic disequilibrium forms the basis for a wide range of hypotheses to explain multiple locus genotype and haplotype frequencies, with gametic equilibrium or Mendel's second law serving as the null hypothesis. The numerous processes that maintain or increase gametic disequilibrium include those discussed in more detail the following sections.