Week 5 assignment 4
1. Vectorising functions
To complete this part of the assignment, you need to define the calc_p()
function from the tutorial:
You have the following data frame containing genotype counts from four populations:
counts <- data.frame(
A1A1 = c(10, 0, 84, 32),
A1A2 = c(30, 48, 13, 15),
A2A2 = c(75, 60, 3, 28),
location = c("loc1", "loc2", "loc3", "loc4")
)
Use
calc_p()
together withapply()
to calculate \(p\) for each population. Add the values tocounts
as a column. hint: you have to select only the numeric columns of the data frame to useapply()
on it.Use
ifelse()
to create a column that says “above 0.5” if p in the population is larger than 0.5, and “below 0.5” if it’s not.
You should print the data frame in the end and show the output in the hand-in.
2. A worked example of \(F_{ST}\)
We sample two fish populations - one in the lake and the other in a stream. We genotype them at locus B. In the lake, the genotype counts are - B1B1 = 32, B1B2 = 12 and B2B2 = 6. In the stream, the genotype counts are B1B1 =10, B1B2 = 16, B2B2 = 43.
Calculate the average expected heterozygosity for the two populations.
- Optional: make a function to calculate average expected heterozygosity.
Calculate the expected heterozygosity for the two populations as a metapopulation.
- Optional: make a function to calculate expected heterozygosity in the metapopulation.
Calculate \(F_{st}\) between the lake and stream fish. How do you interpret this value?
3. More on \(F_{ST}\)
To complete this part of the assignment, you need to define the calc_fst()
function, in addition to the calc_p()
function defined in question 1.
calc_fst <- function(p_1, p_2){
# calculate q1 and q2
q_1 <- 1 - p_1
q_2 <- 1 - p_2
# calculate total allele frequency
p_t <- (p_1 + p_2)/2
q_t <- 1 - p_t
# calculate expected heterozygosity
# first calculate expected heterozygosity for the two populations
# pop1
hs_1 <- 2*p_1*q_1
# pop2
hs_2 <- 2*p_2*q_2
# then take the mean of this
hs <- (hs_1 + hs_2)/2
# next calculate expected heterozygosity for the metapopulations
ht <- 2*p_t*q_t
# calculate fst
fst <- (ht - hs)/ht
# return output
return(fst)
}
Using the
calc_p()
andcalc_fst()
functions we developed during the tutorial and thelct_freq
data, calculate \(F_{ST}\) between theHan_China
and theSwedish_and_Finnish_Scandinavia
populations. What might be a biological explanation of the \(F_{ST}\) value you calculate? Hint: think about what the LCT gene does, and the geographical patterns of lactose intolerance.Using the functions we developed in the tutorial, calculate \(F_{ST}\) around the LCT gene between Americans of European descent and also between African Americans. Plot this like we plotted the \(F_{ST}\) between European Americans and East Asians. What is the highest value of \(F_{ST}\)? How does this compare with the highest \(F_{ST}\) between European Americans and East Asians that we investigated in the tutorial?