Discussion 3, Hardy Weinberg Equilibrium1a. Frequency of M and N:
Indians
Caucasians
Blacks
Chinese
1b. Expected genotype frequencies: MM
MN
NN
Indians
Caucasians
Blacks
Chinese
2. (8 pts) Number of individuals in populations of 2400, with genotypes MM MN NN:
Observed
Expected
MM
MN
NN
MM
Indians
Caucasians
Blacks
Chinese
3a. (10) Chi-square values:
Indians
Caucasians
1
MN
NN
Blacks
Chinese
3b. (2) HW equilibrium or not:
3c. (2)
Indians
Caucasians
Blacks
Chinese
Indians
Caucasians
Blacks
Chinese
4a. (7.5)
4b. (1.5)
2
1
Discussion 3: Hardy Weinberg Equilibrium
Human Blood Groups
Blood is a complex tissue made of many parts, and although they are all worthy of appreciation, at the moment we’re
going to focus on red blood cells (rbcs); more specifically, some of the molecules attached to some of the proteins
embedded in the cell membranes of rbcs (these molecules are called antigens; e.g., Fig. 1). And more specifically, the
antigens that constitute the blood group MN (MNS).
Figure 1. Red blood cells
have many types of bloodgroup antigens associated
with their cell membranes;
the MN system is one such
group.
The MN blood group system involves two separate antigens (M and N), each controlled by a codominant allele at the
same locus, and so every human has either two M alleles, two N alleles, or one of each. Humans do not naturally
produce antibodies for M or N antigens (thus they are of no concern regarding blood transfusions), and so, as far as
anyone knows, this trait is selectively neutral.
Although there are no natural antibodies to the antigens M and N, scientists have developed a means of testing for
the presence of both molecules in blood samples. The relative frequencies of the M and N alleles varies in different
populations of humans. From Makroo et. al. (2013), the following phenotype data (% of population) were available
(reported as they occurred in the paper; a study done in India):
MM
MN
NN
Indians
34.6
54.1
11.3
Caucasians
28.0
22.0
50.0
Blacks
26.0
44.0
30.0
Chinese
32.6
47.1
20.3
Activity 1
1a. For each population, calculate the frequencies of the alleles M and N. Show your work. Note: to avoid rounding
error compounding over the rest of the problem, do not round these values. Use these values as you do the rest of
the calculations.
1b. Using allele frequencies calculated in 1a, calculate the expected genotype frequencies for the four populations, if
they were in H-W equilibrium at the MN locus (do not round). Use those values as you do the rest of the calculations.
Show your work.
The Chi-square Goodness of Fit Test
When you calculate the expected genotype frequencies, you’ll see that some are close to the actual (observed) data;
huh! But we want to know how close; close enough to say that the populations ARE in equilibrium for this trait? -> We
need an objective way to evaluate the deviation of observed genotype frequencies from those expected under HardyWeinberg equilibrium.
Property of Carolee Caffrey. Do not distribute without permission.
2
Statistics is the branch of mathematics wherein the robustness of data is evaluated, and various methods and tests
provide measures of probabilities of outcomes. In this case, you are going to assess whether or not your observed
genotype frequencies differ “significantly” from expected frequencies; statistical significance has to do with the
likelihood that a particular finding could have arisen solely by chance. By convention, scientists use a level of
significance of 0.05 -> deviations of the magnitude you see (in the observed data) could arise by chance alone with a
probability of 0.05, or 5%. (We think this happens naturally – that chance affects things every once in a while – and if
the once-in-a-while is less than 5% of the time [in your data], then nothing special is at work.) -> Thus when we say
that a result is significant at the 5% level, we mean there is only a 5% chance that our conclusion is incorrect (and that
the effect we see was actually due to chance).
You will use a Chi-square Test to determine whether or not the four populations are in HW equilibrium at the MN
locus, and for that test, genotype frequencies must be converted to numbers of individuals possessing the three
genotypes in a representative sample.
Interpreting the Data
When thinking about these exercises, know that, in most cases, statistical deviation from Hardy-Weinberg
expectations indicates violation of one or more of the assumptions of the principle. The opposite does not necessarily
follow, though; for example, migration or mutation could be occurring, but at rates so low as to be undetectable with
available data. Some forms of natural selection (e.g., stabilizing selection) can generate genotypic frequencies similar
to those expected under equilibrium conditions, and it is also the case that populations in nature are finite in size and
so the chance that chance events will affect genotype frequencies (= genetic drift) exists.
If you were to find that a population WAS in equilibrium in one of these exercises (or elsewhere!), it does not mean
that the population is not evolving; it merely indicates that allele and genotype frequencies at this particular locus are
not changing. (A population would be said to be evolving if the frequency of alleles at just a single locus is changing
over generations.)
In natural populations, it is rare to find whole genomes (the complete set of genes in a cell or organism) in HardyWeinberg equilibrium, because, again, no populations are so large as to be effectively infinite in size, mutation events
might be rare but they do occur, migration occurs naturally among most populations not ill-affected by humans, and
most populations are likely under the influence of some type of selection. Plus many organisms tend to mate
assortatively…
Activity 2
2. Imagine the sizes of the populations in 1a are 2400 people each, and calculate the observed and expected numbers
of Indian, Caucasian, Black, and Chinese individuals with the genotypes MM MN NN. (Multiply genotype frequencies
by 2400.) Round values up or down to whole numbers.
To calculate the test statistic χ2, you’ll utilize both the observed (O) and expected (E) numbers of individuals:
χ2 = Ʃ [ (O – E)2 / E]
Activity 3
3a. For each population, calculate (O – E)2 / E for each genotype (MM MN NN); round to up to 4 decimal places. Add
them up (round to up to 4 decimal places). -> This is the population χ2 value. Show your work.
Property of Carolee Caffrey. Do not distribute without permission.
3
3b. Without worrying about such things as [Degrees of Freedom (=1) and the Chi-square distribution itself], I’ll tell you
that the [critical value] for the MN data (at a level of 0.05 significance) is 3.841.
For each population, compare your calculated χ2value to 3.841 and indicate whether the deviation is greater (than
that expected by chance), or less (there’s no reason to say the population is not in HW equilibrium at the MN locus ->
that any deviations you see are just due to chance).
3c. For each of the populations, pick one of the HW conditions (a different one for each population) and apply it to the
data. E.g., is a population not in equilibrium? Possibly it’s because there actually IS selection acting on the MN system,
and different genotypes are favored in the different populations… Oops; I just used one of the five conditions. -> You
have to use the other four. You can make these answers up; they just have to make sense.
Activity 4
4. I found the data below in a paper (Lidicker and McCollum 1997) wherein the authors had examined the genetic
diversity in two populations of sea otters (Enhydra lutris) in the eastern Pacific. The point of the study was to assess
the degree to which the severe bottleneck in the history of the Californian population [humans hunted them down to
50 or fewer individuals before they came under protection in 1911] had caused the loss of genetic variation. (The
population from Alaska also experienced a bottleneck, for the same reason, but it was not as severe.) The letters on
the left refer to loci, and n is the number of individual otters with the different genotypes.
Locus
EST
ICD
LA
PAP
ME
NP
California
Alaska
Genotype
n
n
SS
37
3
SF
20
3
FF
7
2
SS
48
7
SF
4
2
FF
3
0
SS
20
3
SF
11
2
FF
2
3
SS
16
1
SF
7
3
FF
10
2
SS
16
1
SF
11
2
FF
5
1
SS
17
3
SF
4
1
FF
5
0
4a. Determine if the sampled otters in California were in Hardy-Weinberg equilibrium at the EST locus.
Property of Carolee Caffrey. Do not distribute without permission.
4
To get started, given only the number of individuals, remember that each individual has two alleles at the EST locus. ->
You can add up all of the alleles to get the total, and then get the frequencies p and q by dividing the total number of
Ss and the total number of Fs by the total number of all alleles.
➔ Show your work:
Get p and q (do not round).
Calculate expected genotype frequencies (include up to six decimal places) and numbers of individuals (round
the number of expected individuals up or down to whole numbers; do not worry that the expected number
of individuals may end up one more or less than the actual population size).
Compare expected with observed using a Chi-square test. Report the total Chi-square value up to 4 decimal
places of precision. The critical value = 3.841. (We are not grading the values for the separate, different
genotypes.)
4b. Briefly interpret your findings for 4a; the critical value for equilibrium = 3.841. Provide a sentence or two that
could be in the Conclusions section of a paper (if you had to write one) -> Summarize the finding (in equilibrium at that
locus, or not) and speculate as to how at least two factors/conditions might be influencing the situation (e.g., might
some type of selection be operating?).
Literature Cited
Lidicker, WZ and FC McCollum. 1997. Allozymic variation in California sea otters. Journal of Mammalogy 78:417-425.
Makroo, RN, A Bhatia, R Gupta, and J Phillip. 2013. Prevalence of Rh, Duffy, Kell, Kidd & MNSs blood group antigens in
the Indian blood donor population. Indian Journal of Medical Research 137(3): 521–526.
Property of Carolee Caffrey. Do not distribute without permission.
Discussion Week 1, Intro
1a. 10 pts
1b. (pic) 2 pts
Hardy Weinberg Equilibrium Model
describes allele and genotype frequencies
in a population when all the factors that could
change them are absent
GH Hardy
Hardy Weinberg Equilibrium law: one round of
random mating will put a population into equilibrium,
and thereafter:
1. in a population with two alleles – A and a – at a particular locus, the
frequencies of A and a are p and q (p + q = 1) and the expected
genotype frequencies for AA, Aa, and aa = p2, 2pq, and q2, and
2. allele and genotype frequencies in the population will then remain
unchanged.
1
W Weinberg
Hardy Weinberg Equilibrium Conditions
1. Large population size (= “infinite”)
2. Random mating
3. No mutation
4. No migration (= gene flow)
5. No selection
2
MM
MN
NN
Indians
34.6
54.1
11.3
Caucasians
28.0
22.0
50.0
Blacks
26.0
44.0
30.0
Chinese
32.6
47.1
20.3
Activity 1
1a. For each population, calculate the frequencies of the alleles M and N. Show your work. Note: to avoid rounding
error compounding over the rest of the problem, do not round these values. Use these values as you do the rest of
the calculations.
3
As an example, imagine the genotype frequencies were
MM = .385 (the proportion of the population with two M alleles)
MN = .224 (the proportion of the population with one M and one N allele)
NN = .391 (the proportion of the population with two N alleles)
To find the frequency of M (= p) ->
.385 (everyone with genotype MM has M, and only M, allele)
+ ½(.224) (half of all alleles of individuals with genotype MN are M)
= .497 = p
4
MM
MN
NN
Indians
34.6
54.1
11.3
Caucasians
28.0
22.0
50.0
Blacks
26.0
44.0
30.0
Chinese
32.6
47.1
20.3
Activity 1
1a. For each population, calculate the frequencies of the alleles M and N. Show your work. Note: to avoid rounding
error compounding over the rest of the problem, do not round these values. Use these values as you do the rest of
the calculations.
5
1b. Using allele frequencies calculated in 1a, calculate the expected genotype frequencies for the four
populations, if they were in H-W equilibrium at the MN locus (do not round). Use those values as you
do the rest of the calculations. Show your work.
Freq M = p
Indians
M=p =.346 + 1/2(.541)
= 0.6165
N = q = .113 + ½(.541)
= 0.3835
Caucasians
0.39
0.61
Blacks
0.48
0.52
0.5615
0.4385
Chinese
Freq MM = p
2
2
Freq MN = 2pq
Freq NN = q
(0.6165)
(2 x 0.6165 x 0.3835)
(0.3835)
= 0.38007225
= 0.4728555
= 0.14707225
Caucasians
0.1521
0.4758
0.3721
Blacks
0.2304
0.4992
0.2704
0.31528225
0.4924355
0.19228225
Indians
Chinese
6
Freq N = q
2
2
Freq M = p
Indians
M=p =.346 + 1/2(.541)
= 0.6165
N = q = .113 + ½(.541)
= 0.3835
Caucasians
0.39
0.61
Blacks
0.48
0.52
0.5615
0.4385
Chinese
Freq MM = p
2
2
Freq MN = 2pq
Freq NN = q
(0.6165)
(2 x 0.6165 x 0.3835)
(0.3835)
= 0.38007225
= 0.4728555
= 0.14707225
Caucasians
0.1521
0.4758
0.3721
Blacks
0.2304
0.4992
0.2704
0.31528225
0.4924355
0.19228225
Indians
Chinese
7
Freq N = q
MM
MN
NN
Indians
34.6
54.1
11.3
2
Caucasians
28.0
22.0
50.0
Blacks
26.0
44.0
30.0
2
Chinese
32.6
47.1
20.3
Chi-square Test
χ2 = Ʃ [ (O -E)2 / E]
-> Add up
2
( observed-expected )
expected
numbers of individuals
8
for the 3
genotypes
Activity 2
2. Imagine the sizes of the populations in 1a are 2400 people each, and calculate the observed and
expected numbers of Indian, Caucasian, Black, and Chinese individuals with the genotypes MM MN
NN. (Multiply genotype frequencies by 2400.) Round values up or down to whole numbers.
Number of individuals in populations of 2400
Observed
Indians, MM
9
.346 x 2400 = 830
Expected
.38007225 x 2400 = 912
Chi-square Test
χ2 = Ʃ [ (O -E)2 / E]
Observed
Indians, MM
.346 x 2400 = 830
Expected
.38007225 x 2400 = 912
For the Indian population
MM:
10
(830 – 912)2/912 = 7.3728
Chi-square Test
χ2 = Ʃ [ (O -E)2 / E]
For the Indian population
MM:
(830 – 912)2/912 = 7.3728
Do the same for MN and NN
Add them all together
Compare to 3.841
11
3a. Chi-square values
3b. (2) HW equilibrium or not:
Indians
Caucasians
Blacks
Chinese
3c. (2) India
Caucasians
Blacks
Chinese
12
Locus
EST
ICD
LA
PAP
ME
NP
13
Genotype
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
California
n
37
20
7
48
4
3
20
11
2
16
7
10
16
11
5
17
4
5
Alaska
n
3
3
2
7
2
0
3
2
3
1
3
2
1
2
1
3
1
0
Locus
EST
ICD
LA
PAP
ME
NP
14
Genotype
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
California
n
37
20
7
48
4
3
20
11
2
16
7
10
16
11
5
17
4
5
Alaska
n
3
3
2
7
2
0
3
2
3
1
3
2
1
2
1
3
1
0
EST
To get p (freq of S)
Locus
EST
ICD
LA
PAP
ME
NP
15
Genotype
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
California
n
37
20
7
48
4
3
20
11
2
16
7
10
16
11
5
17
4
5
Alaska
n
3
3
2
7
2
0
3
2
3
1
3
2
1
2
1
3
1
0
EST
To get p (freq of S)
Genotype SS all alleles S = 74
Genotype SF half of alleles S = ½ (40) = 20
All S alleles = 94
Locus
EST
ICD
LA
PAP
ME
NP
16
Genotype
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
SS
SF
FF
California
n
37
20
7
48
4
3
20
11
2
16
7
10
16
11
5
17
4
5
Alaska
n
3
3
2
7
2
0
3
2
3
1
3
2
1
2
1
3
1
0
EST
To get p (freq of S)
Genotype SS all alleles S = 74
Genotype SF half of alleles S = ½ (40) = 20
All S alleles = 94
All alleles total = 74 + 40 + 14 = 128
Freq of S = 94/128 = 0.734375 = p
4a. Determine if the sampled otters in California were in Hardy-Weinberg equilibrium at the EST locus.
To get started, given only the number of individuals, remember that each individual has two alleles at the EST
locus. -> You can add up all of the alleles to get the total, and then get the frequencies p and q by dividing the
total number of Ss and and the total number of Fs by the total number of all alleles.
➔ Show your work:
Get p and q (do not round).
Calculate expected genotype frequencies (include up to six decimal places) and numbers of
individuals (round the number of expected individuals up or down to whole
numbers; do not worry that the expected number of individuals may end up one
more or less than the actual population size).
Compare expected with observed using a Chi-square test. Report the total Chi-square value up to
4 decimal places of precision. The critical value = 3.841. (We are not grading the
values for the separate, different genotypes.)
4b. Briefly interpret your findings for 4a; the critical value for equilibrium = 3.841. Provide a sentence or two
that could be in the Conclusions section of a paper (if you had to write one) -> Summarize the finding (in
equilibrium at that locus, or not) and speculate as to how at least two factors/conditions might be influencing
the situation (e.g., might some type of selection be operating?).
17