Week 8 assignment

1. Visualising complex data

This part of the assignment uses the iris data set which is already available in R.

  1. Make a scatterplot of Sepal.Length against Sepal.Width. Facet the plot by Species, and arrange the plots so that they are below each other. Color the points by Petal.Length

  2. Optional Let’s try to visualise all the iris data at once. Use pivot_longer() to get all measurements (i.e. all columns except Species) in a single column. Make a boxplot of species against measurement value, and facet the plot by measurement type.

2. Sparrow data

For this part of the assignment you should start out with the sparrow_data data frame that we created in section 7.3.2. A lot of the operations you need to do is covered in section 7.4, so look there if you don’t know where to start!

  1. Calculate the mean pairwise Fst and also the mean pairwise dxy for all of the different species comparisons. N.B. if you use a tidyverse solution, it may be easier to use t() to transpose and see the final result

  2. Use a boxplot to visualise the distribution of \(F_{ST}\) and \(d_{XY}\) for all the different species comparisons. Hint: You need to use pivot_longer on the columns containing \(F_{ST}\) and \(d_{XY}\) data respectively.

  3. Draw a composite plot (fst, nucleotide diversity, dxy) for the house versus italian comparison. Is there a similar pattern to the house/spanish data we examined in the tutorial?

  4. Plot the relationship between \(d_{XY}\) for house vs spanish and recombination rate (similar to the very last plot in the tutorial (NOT the one in the footnotes!), but with \(d_{XY}\) instead of \(F_{ST}\)). Is it the same as that for Fst? Explain why/why not.