7.2 Returning to the sparrow dataset
In the last session, we used the GenoPop package to calculate sliding window estimates of nucleotide diversity across chromosome 8 of the house sparrow with data from Ravinet et al. (2018). We will now return to this example and use it to demonstrate why we must interpret the genomic landscape of differentiation with caution.
7.2.1 Preparing to read in the sparrow vcf
The first step we need to take is to read our VCF of the sparrow chromosome 8 into the R environment. This is exactly the same procedure as the last session but just in case you missed those steps, here they are again.
- First, download the VCF and it’s index (we need this to read it in to R)
- Move the downloaded VCF into your working directory (use
getwdif you don’t know where that is)
We will work with this file soon, but first some housekeeping. We should set a variable to the path of this VCF. Once it is in your working directory:
We eventually want to investigate differences between populations, but the data does not currently contain information about the populations, only individuals. Download the population data and put it in your working directory. The following code reads in the population data and creates a vector of individuals which are house sparrows.
You can also create these vectors for each of the other species present. Have a look at the sparrow_info object to see what they are. We will also do this below.
bactrianus <- sparrow_info$ind[sparrow_info$pop == "Bactrianus"]
spanish <- sparrow_info$ind[sparrow_info$pop == "Spanish"]
italian <- sparrow_info$ind[sparrow_info$pop == "Italian"]
tree <- sparrow_info$ind[sparrow_info$pop == "Tree"]Now you are done, we are ready to calculate some parameters on genomic data!