Week 8 assignment
1. Visualising complex data
This part of the assignment uses the iris
data set which is already available in R.
Make a scatterplot of
Sepal.Length
againstSepal.Width
. Facet the plot bySpecies
, and arrange the plots so that they are below each other. Color the points byPetal.Length
Optional Let’s try to visualise all the
iris
data at once. Usepivot_longer()
to get all measurements (i.e. all columns exceptSpecies
) in a single column. Make a boxplot of species against measurement value, and facet the plot by measurement type.
2. Sparrow data
For this part of the assignment you should start out with the sparrow_data
data frame that we created in section 7.3.2. A lot of the operations you need to do is covered in section 7.4, so look there if you don’t know where to start!
Calculate the mean pairwise Fst and also the mean pairwise dxy for all of the different species comparisons. N.B. if you use a tidyverse solution, it may be easier to use t() to transpose and see the final result
Use a boxplot to visualise the distribution of \(F_{ST}\) and \(d_{XY}\) for all the different species comparisons. Hint: You need to use pivot_longer on the columns containing \(F_{ST}\) and \(d_{XY}\) data respectively.
Draw a composite plot (fst, nucleotide diversity, dxy) for the house versus italian comparison. Is there a similar pattern to the house/spanish data we examined in the tutorial?
Plot the relationship between \(d_{XY}\) for house vs spanish and recombination rate (similar to the very last plot in the tutorial (NOT the one in the footnotes!), but with \(d_{XY}\) instead of \(F_{ST}\)). Is it the same as that for Fst? Explain why/why not.