Week 5 assignment
1. Vectorising functions
To complete this part of the assignment, you need to define the calc_p()
function from the tutorial:
<- function(counts){
calc_p # get the number of samples
<- sum(counts)
n # calculate frequency of 1st allele - p
<- ((counts[1]*2) + counts[2])/(2*n)
p return(p)
}
You have the following data frame containing genotype counts from four populations:
<- data.frame(
counts A1A1 = c(10, 0, 84, 32),
A1A2 = c(30, 48, 13, 15),
A2A2 = c(75, 60, 3, 28),
location = c("loc1", "loc2", "loc3", "loc4")
)
Use
calc_p()
together withapply()
to calculate \(p\) for each population. Add the values tocounts
as a column. hint: you have to select only the numeric columns of the data frame to useapply()
on it.Use
ifelse()
to create a column that says “above 0.5” if p in the population is larger than 0.5, and “below 0.5” if it’s not.
You should print the data frame in the end and show the output in the hand-in.
2. A worked example of \(F_{ST}\)
We sample two fish populations - one in the lake and the other in a stream. We genotype them at locus B. In the lake, the genotype counts are - B1B1 = 32, B1B2 = 12 and B2B2 = 6. In the stream, the genotype counts are B1B1 =10, B1B2 = 16, B2B2 = 43.
Calculate the average expected heterozygosity for the two populations.
- Optional: make a function to calculate average expected heterozygosity.
Calculate the expected heterozygosity for the two populations as a metapopulation.
- Optional: make a function to calculate expected heterozygosity in the metapopulation.
Calculate \(F_{st}\) between the lake and stream fish. How do you interpret this value?
3. More on \(F_{ST}\)
To complete this part of the assignment, you need to define the calc_fst()
function, in addition to the calc_p()
function defined in question 1.
<- function(p_1, p_2){
calc_fst
# calculate q1 and q2
<- 1 - p_1
q_1 <- 1 - p_2
q_2
# calculate total allele frequency
<- (p_1 + p_2)/2
p_t <- 1 - p_t
q_t
# calculate expected heterozygosity
# first calculate expected heterozygosity for the two populations
# pop1
<- 2*p_1*q_1
hs_1 # pop2
<- 2*p_2*q_2
hs_2 # then take the mean of this
<- (hs_1 + hs_2)/2
hs
# next calculate expected heterozygosity for the metapopulations
<- 2*p_t*q_t
ht
# calculate fst
<- (ht - hs)/ht
fst
# return output
return(fst)
}
Using the
calc_p()
andcalc_fst()
functions we developed during the tutorial and thelct_freq
data, calculate \(F_{ST}\) between theHan_China
and theSwedish_and_Finnish_Scandinavia
populations. What might be a biological explanation of the \(F_{ST}\) value you calculate? Hint: think about what the LCT gene does, and the geographical patterns of lactose intolerance.Using the functions we developed in the tutorial, calculate \(F_{ST}\) around the LCT gene between Americans of European descent and also between African Americans. Plot this like we plotted the \(F_{ST}\) between European Americans and East Asians. What is the highest value of \(F_{ST}\)? How does this compare with the highest \(F_{ST}\) between European Americans and East Asians that we investigated in the tutorial?