Week 10 Advancing Further in R

Throughout these tutorials, we have introduced you to evolutionary genetic concepts using R. By now, you should have some familiarity with the versatility of R for data analysis and what is possible with it. We have gone from manipulating vectors and dataframes to processing genome-resequencing data and calculating population genomic statistics. With a course as broad as this, we understand it is difficult to feel like you fully understand every aspect of the analyses you are conducting - there is obviously a lot to learn! Ultimately, it is impossible for you to achieve this within a single course, using R properly takes experience and practice. Indeed, we’d argue that there is no real mastery of R, it is a tool with which you are always able to learn new things. We learn all the time in R and we have been using it for quite some time!

Nonetheless, there are obviously basics that you can master and that you can build upon in your own work, research and analysis. You are already familiar with many of the most important of these - data structures, how to use functions, how to manipulate and visualise data. How can we go further, to giving you an introduction to some more advanced techniques in R? In this final session, will take a step back from population and evolutionary genetics to focus once more on how R itself works, except this time we will focus mainly on programming. We will return to some R programming topics you have actually already encountered but in more detail, with more of a focus on explaining them piece-by-piece.

What to expect

In this section we will:

  • learn about some advanced features of RStudio
  • learn about joining data sets
  • learn more about vectorisation, and how to use lapply() and sapply()

Getting started

As always, we need to set up our R environment. We’ll load tidyverse as usual and that’s the only package we will use today.

# clear the R environment
rm(list = ls())

# load packages
library(tidyverse)