Week 7 assignment

1. Working with DNA sequence data

For this part of the assignment, you need to download the assignment-7.fasta file here.

  1. Import the data and view the sequences in R.

  2. How long are the sequences, and what are the base frequencies?

  3. Calculate standardised segregating sites and nucleotide diversity of the data.

2. The woodmouse data

  1. How many segregating sites (i.e. actual number) are there if we subset the woodmouse dataset to 12 individuals and 500 basepairs? What is the nucleotide diversity?

The Tajima’s D test we performed on the woodmouse data is not significantly different from zero. However, for our purposes here, imagine that it is.

  1. What was the value of Tajima’s D, and how do you interpret this? There are several factors influencing the sign of Tajima’s D, and you need to argue for which you think is most likely in this specific case. Recall that the data is from mitochondrial DNA (believed to be neutral) and was used in a study of the woodmouse’s demographic history since the last ice age, where it seems likely that they survived in a refugia in Southern Europe and then recolonised Northern Europe following ice retreat.

3. Sparrows

  1. In your own words, what is a sliding window analysis? Why did we need to use a sliding window analysis to visualise this data?

  2. Plot the sliding window nucleotide diversity for both the spanish sparrow (using red) and the italian sparrow (using gold). Do you see the same region of reduced diversity as with the house sparrow? If there is a shared pattern, what might explain it?