Читать книгу Population Genetics - Matthew B. Hamilton - Страница 24
2.3 Why does Hardy–Weinberg work?
ОглавлениеA proof of Hardy–Weinberg.
Hardy–Weinberg with more than two alleles.
The Hardy–Weinberg equation is one of the most basic expectations we have in population genetics. It is very likely that you were already familiar with the Hardy–Weinberg equation before you picked up this book. But where does Hardy–Weinberg actually come from? What is the logic behind it? Let's develop a simple proof that Hardy–Weinberg is actually true. This will also be our first real foray into the type of the algebraic argument that much of population genetics in built on. Given that you start out knowing the conclusion of the Hardy–Weinberg tale, this gives you the opportunity to focus on the style in which it is told. Algebraic or quantitative arguments are a central part of the language and vocabulary of population genetics, so part of the task of learning population genetics is becoming accustomed to this mode of discourse.
We would like to prove that p2 + 2pq + q2 = 1 accurately predicts genotype frequencies given the values of allele frequencies. Let's start off by making some explicit assumptions to bound the problem. The assumptions, in no particular order, are:
1 mating is random (parents meet and mate according to their frequencies);
2 all parents have the same number of offspring (equivalent to no natural selection on fecundity);
3 all progeny are equally fit (equivalent to no natural selection on viability);
4 there is no mutation that could act to change an A to a or an a to A;
5 it is a single population that is very large;
6 there are two and only two mating types.
Now, let's define the variables we will need for a case with one locus that has two alleles (A and a).
N = Population size of individuals (N diploid individuals have 2N alleles)
Allele frequencies | |
p = frequency(A allele) = (total number of A alleles)/2N | |
q = frequency(a allele) = (total number of a alleles)/2N | |
p + q = 1 | |
Genotype frequencies | |
X = frequency(AA genotype) = (total number of AA genotypes)/N | |
Y = frequency(Aa genotype) = (total number of Aa genotypes)/N | |
Z = frequency(aa genotype) = (total number of aa genotypes)/N | |
X + Y + Z = 1 |
We do not distinguish between the heterozygotes Aa and aA and treat them as being equivalent genotypes. Therefore, we can express allele frequencies in terms of genotype frequencies by adding together the frequencies of A‐containing and a‐containing genotypes:
(2.2)
(2.3)
Each homozygote contains two alleles of the same type, while each heterozygote contains one allele of each type so the heterozygote genotypes are each weighted by half.
With the variables defined, we can then follow allele frequencies across one generation of reproduction. The first step is to calculate the probability that parents of any two particular genotypes will mate. Since mating is assumed to be random, the chance that two genotypes will mate is just the product of their individual frequencies. As shown in Figure 2.7, random mating can be thought of as being like gas atoms in a balloon. As with gas atoms, each genotype or gamete bumps into others at random, with the probability of a collision (or mating or union) being the product of the frequencies of the two objects colliding. To calculate the probabilities of mating among the three different genotypes, we can make a table to organize the resulting mating frequencies. This table will predict the mating frequencies among genotypes in the initial generation, which we will call generation t.
Figure 2.7 A schematic representation of random mating as a cloud of gas where the frequency of A's is 14/24 and the frequency of a's 10/24. Any given A has a frequency of 12/20 and will encounter another A with probability of 14/24 or an a with the probability of 10/24. This makes the frequency of an A‐A collision (14/24)2 and an A‐a or a‐A collision 2(14/24)(10/24), just as the probability of two independent events is the product of their individual probabilities. The population of A's and a's is assumed to be large enough so that taking one out of the cloud will make almost no change in the overall frequency of its type.
A parental mating frequency table (generation t) is shown below.
Moms | Frequency | Dads | ||
AA | Aa | aa | ||
X | Y | Z | ||
AA | X | X 2 | XY | XZ |
Aa | Y | XY | Y 2 | YZ |
Aa | Z | ZX | ZY | Z 2 |
The table expresses parental mating frequencies in the currency of genotype frequencies. For example, we expect matings between AA moms and Aa dads to occur with a frequency of XY.
Next, we need to determine the frequency of each genotype in the offspring of any given parental mating pair. This will require that we predict the offspring genotypes resulting from each possible parental mating. We can do this easily with a Punnett square. We will use the frequencies of each parental mating (above) together with the frequencies of the offspring genotypes. Summed for all possible parental matings, this gives the frequency of offspring genotypes one generation later, or in generation t + 1. A table will help organize all the frequencies, like the offspring frequency table (generation t + 1) shown below.
Parental mating | Total frequency | Offspring genotype frequencies | ||
AA | Aa | aa | ||
AA × AA | X 2 | X 2 | 0 | 0 |
AA × Aa | 2XY | XY | XY | 0 |
AA × aa | 2XZ | 0 | 2XZ | 0 |
Aa × Aa | Y 2 | Y2/4 | (2Y2)/4 | Y2/4 |
Aa × aa | 2YZ | 0 | YZ | YZ |
aa × aa | Z 2 | 0 | 0 | Z 2 |
In this table, the total frequency is just the frequency of each parental mating pair taken from the parental mating frequency table. We now need to partition this total frequency of each parental mating into the frequencies of the three progeny genotypes produced. Let's look at an example. Parents with AA and Aa genotypes will produce progeny with two genotypes: half AA and half Aa (you can use a Punnett square to show this is true). Therefore, the AA × Aa parental matings, which have a total frequency of 2XY under random mating, are expected to produce (½)2XY = XY of each of AA and Aa progeny. The same logic applies to all of the other parental matings. Notice that each row in the offspring genotype frequency table sums to the total frequency of each parental mating.
The columns in the offspring genotype frequency table are the basis of the final step. The sum of each column gives the total frequencies of each progeny genotype expected in generation t + 1. Let's take the sum of each column, again expressed in the currency of genotype frequencies, and then simplify the algebra to see whether Hardy and Weinberg were correct.
(2.4)
So, we have proved that progeny genotype and allele frequencies are identical to parental genotype and allele frequencies over one generation or that f(A)t = f(A)t + 1. The major conclusion here is that genotype frequencies remain constant over generations as long as the assumptions of Hardy–Weinberg are met. In fact, we have just proved that under Mendelian heredity, genotype and allele frequencies should not change over time unless one or more of our assumptions is not met. This simple model of expected genotype frequencies has profound conclusions. In fact, Hardy–Weinberg expected genotype frequencies serve as one of the most basic tools to test for the action of biological processes that alter genotype and allele frequencies.
You might wonder whether Hardy–Weinberg applies to loci with more than two alleles. For the last point in this section, let's explore that question. With three alleles at one locus (allele frequencies symbolized by p, q, and r), Hardy–Weinberg expected genotype frequencies are p2 + q2 + r2 + 2pq + 2pr + 2qr = 1. These genotype frequencies are obtained by expanding (p + q + r)2, a method that can be applied to any number of alleles at one locus. In general, expanding the squared sum of the allele frequencies will show:
the frequency of any homozygous genotype is the squared frequency of the single allele that composes the genotype ([allele frequency]2);
the frequency of any heterozygous genotype is twice the product of the two allele frequencies that comprise the genotype (2[allele 1 frequency][allele 2 frequency]), and
there are as many homozygous genotypes as there are alleles and heterozygous genotypes where N is the number of alleles.
Do you think it would be possible to prove Hardy–Weinberg for more than two alleles at one locus? The answer is absolutely, yes. This would just require constructing larger versions of the parental genotype mating table and expected offspring frequency table as we did for two alleles at one locus.