1.3 Data import

This part of the tutorial is a simplified version of this tutorial. If you want to know more about data import in R, check that one out!

I’ve frequently had you imagine doing the data input for data from all countries in the world throughout this tutorial. You may already have thought manually inputting data for the entire world like we did with the Nordic countries above is time-consuming and error-prone. This is correct, and we should be able to do something smarter. Luckily, we can download data as a text file and import it into R as a data frame!

1.3.1 Data formats

Start by downloading the file worlddata.csv and make sure you know where on your computer it’s downloaded to (suggestion: the same folder as your R script). Open the file in a plain text editor on your computer (e.g. Notepad on Windows or TextEdit on Mac). You can see that we have one row of variable names, and then one row of data for each country. The values are separated by comma making this what we call comma separated values (csv), which is a very common format to store data in. Data entries can also be separated by e.g. spaces, tabs, semicolon and much more. All of these can be imported into R, but you always need to be aware how your data are formatted.

1.3.2 Working directories

The next step is to make R look in the same folder that your data is in. An important concept is that R only works in one folder at a time. You can get the folder where R is currently looking, called the working directory, by running the command

getwd()

This may or may not be the folder where your text file (and/or R script) is located (probably it isn’t). If R is looking in a different folder than you want it to, you need to tell it where to look, this can be done in a variety of ways, the most manual being the function setwd():

# set the working directory to the BIOS1140 folder within "Documents"
setwd("C:/Documents/BIOS1140")

Another way to set the working directory (in RStudio) is by navigating to Session > Set working directory and choosing the folder you want. A third way is to find the “Files” tab in RStudio (in the bottom-right pane), navigate to your folder there, and click More > Set as working directory.

Exercise: Download the csv file to your computer (make sure you know where), and set the working directory to the same folder as the file is in.

Important concept: R only works in one folder at a time, this folder is called the working directory. Get your current working directory with getwd() and set a new working directory either with setwd() or by navigating in RStudio. To import a file into R it has to be in your current working directory!

1.3.3 Importing

When importing data, there are three things you need to consider:

  1. Does the first row in the data contain variable names? If so, this is called a header.
  2. How are the values separated? (e.g. comma, semicolon, tab)
  3. What is the decimal marker? It is usually period . but can sometimes be comma ,

Look at the file worlddata.csv and answer the three questions above for that file.

The function for reading data into R is read.table(). The first argument is the file name in quotes. It also has three more arguments5 to answer the three questions above.

  1. set header = TRUE if the data has a header, header = FALSE if it doesn’t.
  2. sep ="," for comma separated values, ";" for semicolon, "\t" for tab space, and much more.
  3. dec is the decimal marker, either dec = "." or dec = ","

Now you should be able to import the data! I’ve made some skeleton code for you to fill out below to simplfy things.

Exercise: import worlddata.csv into R by filling in the blanks in the code below.

# fill in the blanks (after "=")
world_data <- read.table("worlddata.csv", header = , sep = , dec = )
world_data <- read.table("worlddata.csv", header = TRUE, sep = ",", dec = ".")

If you managed to do this, you now have data for all countries in the world stored in a data frame! Try doing some operations on it to get familiar with the data. Some suggestions (remember that you can write ?functionname to find out more about a function):

  • head() shows the first rows of the data frame
  • summary() gives a summary of key aspects of the data frame
  • use the $ operator to extract columns, add or multiply them together, or whatever you want!

  1. see section 1.2.5 if you forgot what an argument is↩︎