Population Genetics I
Genetic Variation

Genetic variation is essential for evolution to take place.
The extent of genetic variation in natural populations was hard to ascertain using classical genetic analysis, since such analysis can only detect genes that are variable.

Early hints that populations might harbor much variation.

Quantifying allelic variation
A useful measure of variation is the Heterozygosity (H) - The proportion of loci in an individual that are heterozygous.
This is often averaged over all individuals in the population: = mean heterozygosity

For protein coding genes in humans, = 0.15.
If there are 22,000 protein coding genes in the human genome, then this amounts to an average of 3,300 heterozygous loci per individual.

In fact, there are generally many rare recessive alleles in populations; often with frequency < 0.01.
On average, individual humans each have the equivalent of around 4 recessive lethal alleles.

Relationship between allele frequencies and genotype frequencies
After the rediscovery of Mendel's rules, some people interpreted them to mean that dominant alleles would spread through a population, or that evolution would lead to phenotypes being found in "Mendelian ratios" (e.g. 3:1).

In response to this, G.H. Hardy demonstrated that dominance does not drive evolution.

At the same time (but in Germany) W. Weinberg demonstrated the same thing, though for a different purpose.

Hardy's and Weinberg's papers were both published in 1908.

Calculating allele frequencies from genotype frequencies

Consider a single locus with two alleles, A1 and A2.
    Then there are three possible genotypes: A1A1, A1A2, and A2A2.

Let the frequency of the A1 allele = freq[A1] = p
If there are only two alleles, then freq[A2] = 1 - p

If we know the frequency of the genotypes, we can calculate p:

p     =     2*(#of A1A1 genotypes) + (# of A1A2 genotypes)
                                     2*(total # of individuals)

Rearranging, this gives:

---------------------------------------------------------------

Homework Problem (Part 1) (Due in class Feb. 8th)

Consider a single locus with four alleles: A1, A2, A3, and A4.

1) Write down all of the diploid genotypes possible at this locus.
2) Derive the equation for the frequency of allele A1 as a function of the genotype frequencies.

---------------------------------------------------------------

This calculation involved no simplifying assumptions, we just count alleles.
Going from allele frequencies to genotype frequencies, though, does require some assumptions.

Calculating genotype frequencies from allele frequencies

Hardy-Weinberg frequencies.
    Under the assumption of

The last four of these assumptions preclude obvious mechanisms of allele frequency change.

Using P[x] to denote 'probability of x', we can calculate the frequencies of the different genotypes as:

A population that exhibits these genotype frequencies is said to be in Hardy-Weinberg Equilibrium.

If the above assumptions hold, then H-W equilibrium is reached in one generation.

If a population is not in H-W equilibrium, then we can infer that at least one of the above assumptions is violated.

---------------------------------------------------------------

Homework Problem (Part 2) (Due in class Feb. 8th)

Consider the following two datasets:

At the M locus:
In a study of 1000 adults, the following genotype numbers were found:
 M1M1 - 298 individuals
 M1M2 - 489 individuals
 M2M2 - 213 individuals

At the S locus:
In a study of 9110 adults, the following genotype numbers were found:
 S1S1 - 24 individuals
 S1S2 - 1958 individuals
 S2S2 - 7128 individuals

Assignment: For both loci

----------------------------------------------------------------

Two Loci

When we consider more than one locus, there is an added complication. Sometimes the allele at locus B is not independent of the allele at locus A.

By "not independent", we mean that knowing which allele is at the A locus in a particular gamete gives us some information about which allele is at the B locus.

This is particularly likely if the loci are close to one another on a chromosome, but it can happen even if the loci are on different chromosomes.

When studying two loci together, we follow the frequency of Haplotypes rather than alleles.

A Haplotype is the set of alleles, at multiple loci, that are together in a haploid gamete (egg or sperm cell)

Example:

Consider a locus A with two alleles, A1 and A2, each with frequency 0.5; and another locus, B, also with two alleles, B1 and B2 each with frequency 0.5.
Define p as the frequency of the A1 allele and q the frequency of the B1 allele.

There are four possible haplotypes in this case: A1B1, A1B2, A2B1, and A2B2.

If the alleles at the A locus and B locus are independent of one another, then given the frequencies stated above we should expect to see all of these haplotypes present in the population, each at a frequency of 0.25.

This population is in gametic equilibrium because all gamete types are at the frequencies that we would expect if the allele at one locus gives us no information about the allele at the other locus.

(Note: Gametic equilibrium is sometimes still called "linkage equilibrium", even though this term is less descriptive, since linkage is not necessary).

Now consider a different case; still with all alleles at frequency 0.5, but now this distribution of haplotypes in the population:

In this case, A1 is paired with a B1 more often, and with B2 less often, than we would expect if they were independent. Similarly, A2 is paired with a B2 more often than their independent frequencies would lead us to expect. The alleles at the two loci are thus not independent, so this is a case of gametic disequilibrium.

When two loci are in gametic disequilibrium, selection acting on locus A will lead to a change in locus B as well, even if there is no direct selection on B.

The degree of gametic disequilibrium is denoted 'D'. If p is the frequency of the A1 allele and q the frequency of the B1 allele, then:

With random mating and no selection, gametic disequilibrium declines over time due to recombination. Letting 'r' denote the rate of recombination between the loci, then the change in D from one generation to the next is given by:

Thus, if there is any recombination at all, Gametic Disequilibrium declines to zero with random mating and no selection.

Note that r ranges from 0 to 0.5, with r = 0.5 for loci on different chromosomes.
Thus, even for unlinked loci, D does not decline to zero in one generation. Jul 8, 2021