3.1 R programming: for-loops

In this section we will learn about for-loops, which is an important concept for any kind of programming (not just in R!). It is made so you can follow it even if you’ve never heard of loops in programming before, but should also be useful for those who already have experience with loops in another programming language (like Python) and want to learn how it’s done in R.

3.1.1 Motivation: why loop?

Let’s start by making a numeric vector that we want do do some operations on.

x <- seq(2, 20, 2)
x
#>  [1]  2  4  6  8 10 12 14 16 18 20

Say you want to multiply every element by 5. You’ve already learned to do that in R:

x * 5
#>  [1]  10  20  30  40  50  60  70  80  90 100

However, what if we want to add two and two elements of the vector together? With our vector x, this would be 2+4, 4+6, 6+8 and so on. There is no simple way to do this that you’ve learned yet. We could use the square brackets to extract individual elements and do 9 calculations like this:

x[1] + x[2]
x[2] + x[3]
x[3] + x[4]
x[4] + x[5]
x[5] + x[6]
x[6] + x[7]
x[7] + x[8]
x[8] + x[9]
x[9] + x[10]

But this is a lot of typing for doing a repetitive task, which we want to avoid¹⁴. What if we could generalize this, so the computer does the repeated operations for us instead? That’s where for-loops come in.

3.1.2 How a for-loop works

Before we go into solving our example, we have to learn a bit about for-loops. A for-loop in R conceptually looks like this:

for (variable in vector){ # variable starts as the first element of vector
  
  # do something involving variable
  # the part within the curly brackets is called
  # the body of the for-loop
  
} # when the curly bracket ends, variable becomes the next element of vector

When the loop starts, variable is set to the first element of vector. Within the curly braces (“krølleparentes”) some operation is done using this variable. Then, after the closing curly bracket, variable becomes the next element in vector, and whatever is inside the curly braces gets repeated with this updated variable. This goes on until you’ve been through all elements of vector. Since it repeats an operation for all the elements of vector, we say that it “loops over” vector.

This might be easier to understand if you see a real example of a for loop¹⁵:

for (element in 1:10){
  print(element)
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10

Here you can see more of what’s actually happening. In the first round, element is the first element of 1:10, i.e. 1, which is printed. Then, element becomes the second element of 1:10, which is 2, and print() prints it. This goes on until you have looped over the entire vector, and the loop ends after printing 10. Below is what actually happens for each round of the loop.

print(1) # round 1, element is 1
print(2) # round 2, element is 2
print(3) # round 3, element is 3
print(4) # ... and so on
print(5)
print(6)
print(7)
print(8)
print(9)
print(10)

Notice how many lines you saved by writing a loop! The power here comes from that it doesn’t matter how long your vector is, and you can do any operation on the elements of the vector. Say we want to multiply each element of our vector x with 5, which we did in the start of this tutorial.

for(element in x){
  print(element * 5)
}
#> [1] 10
#> [1] 20
#> [1] 30
#> [1] 40
#> [1] 50
#> [1] 60
#> [1] 70
#> [1] 80
#> [1] 90
#> [1] 100

You can also use this with other kinds of vectors, e.g. a vector of strings.

animals <- c("cat", "dog", "horse", "badger", "unicorn")

for (animal in animals){
  print(animal)
}
#> [1] "cat"
#> [1] "dog"
#> [1] "horse"
#> [1] "badger"
#> [1] "unicorn"

Important concept:
Use a for-loop to do the same operation over and over on the elements of your vector. The basic structure of a for-loop looks like this:

for (variable in vector){
  # do something
}

Tip:

You may have noticed that I’ve called the variable that is changing for each iteration different things in all the examples, namely variable, element and animal. Actually, you can call this variable anything (within reason). All you have to remember is to call it the same thing within the loop as when starting it. In other words, this works¹⁶:

for (whatever_you_want_to_call_the_variable in 1:10){
  print(whatever_you_want_to_call_the_variable)
}

But this doesn’t:

for (some_name in 1:10){
  print(another_name)
}

Exercise: Create a vector containing the names of five countries. Use a for-loop to print the countries. Optional: use the paste() function to output “country is a country” for each element

3.1.3 Indexing with for-loops

To solve our initial problem (and also for the things we will be doing later), we need to introduce one more concept: using the changing variable in your for-loop as an index for your vectors. If we look once more at our animals vector above, there are actually two ways of printing every element:

# printing the element like we did earlier
for (animal in animals){
  print(animal)
}

# printing the element using an index
for (index in 1:5){
  print(animals[index])
}

The execution of the latter for-loop looks like this:

print(animals[1])
print(animals[2])
print(animals[3])
print(animals[4])
print(animals[5])

Note that rather than looping over the animals vector itself, we loop over a vector from 1 to 5, using those numbers to access the values inside animals. Here we made this vector by writing 1:5, but a better way would be writing 1:length(animals) so we can be sure that the index vector is the same length as the animals vector. Looping this way has the advantage that we can loop over several vectors at the same time:

score <- c("good", "great", "fine", "best", "probably not real")

# looping over both animals and score
# note the use of 1:length(animals) instead of 1:5

for (index in 1:length(animals)){
  # paste together the current element in animals and score
  # grading will be overwritten every round of the loop
  grading <- paste(animals[index], "is", score[index])
  print(grading)
}
#> [1] "cat is good"
#> [1] "dog is great"
#> [1] "horse is fine"
#> [1] "badger is best"
#> [1] "unicorn is probably not real"

We can also access more than one element of a vector at once, by using e.g. index - 1 to access the previous element.

for (index in 2:length(animals)){ #note: starting on 2
  friends <- paste(animals[index], "and", animals[index - 1], "are friends")
  print(friends)
}
#> [1] "dog and cat are friends"
#> [1] "horse and dog are friends"
#> [1] "badger and horse are friends"
#> [1] "unicorn and badger are friends"

Show the code below to see how this would look if done manually.

# first round, remember that writing animals[2 - 1] is
# exactly the same as writing animals[1]
friends <- paste(animals[2], "and", animals[2 - 1], "are friends")
print(friends)

# second round
friends <- paste(animals[3], "and", animals[3 - 1], "are friends")
print(friends)

# third round
friends <- paste(animals[4], "and", animals[4 - 1], "are friends")
print(friends)

# fourth round
friends <- paste(animals[5], "and", animals[5 - 1], "are friends")
print(friends)

Now we know all we need to solve our initial problem, which we will return to in the next section.

Exercise: In addition to your country vector from before, make a corresponding vector containing continents. Use indexing with for loops and the paste function to print “country is in continent” for each of the countries and continents in your vector.

Important concept:
Use for-loops with indexing when you want to access several elements of one or more vectors at once. Most of the time you will use indexing rather than looping over a vector directly.

3.1.4 Solving our problem

To remind you of where we started: we want to add the adjacent elements of our vector x together in the smartest way possible.

x <- seq(2, 20, 2)

Below is how this can be done with for-loops. I encourage you to try to solve it yourself before looking at the solution.

Show hint

Hint: Loop over the vector 2:length(x)¹⁷. For every round, add x[index] and x[index - 1] together and print this.

for (index in 2:length(x)){
  added <- x[index] + x[index - 1]
  print(added)
}
#> [1] 6
#> [1] 10
#> [1] 14
#> [1] 18
#> [1] 22
#> [1] 26
#> [1] 30
#> [1] 34
#> [1] 38

3.1.5 Storing values from a for-loop

One final concept before going on to work with evolutionary biology. In the last section, we printed our results. What if we wanted to do work further with the results we got? We could copy the numbers from the console and create a new vector manually, but (like so many of my stupid suggestions throughout these tutorials) this is bothersome and doesn’t scale well. The best way to store values from a for loop is to create an empty vector before the loop, and fill in that vector as we loop.

First we use the rep() function to create a vector containing NA. You can read the following rep(NA, 10) as “repeat NA, 10 times”, i.e. you get a vector of 10 NA. We use NA for our vector because our calculations within the loop will not produce any NA (unless something goes horribly wrong). This way, we can easily see if something went wrong in our loop (if there are any NA left in our results vector after our loop, something is probably off).

# repeat 0, 10 times
results <- rep(NA, 10)

# to ensure that results is the same length as x, we should instead write:
results <- rep(NA, length(x))

Then, we loop over the same index as before, but instead of printing our result, we store it as an element of our results vector.

for (index in 2:length(x)){
  results[index] <- x[index] + x[index - 1]
}

This doesn’t print anything yet, but the results are now stored in the results vector.

results
#>  [1] NA  6 10 14 18 22 26 30 34 38

We see that the first element us still NA (since we started on the second element), which isn’t perfect, but good enough for now!

That concludes this week’s R-focused part of the tutorial. Some of the Evolutionary biology-focused part will use the concepts you have learned here, so check back here if there is a part of the code you don’t understand.

because it’s bothersome and there’s a great chance of making errors, and imagine how long it would take if our vector had 1000 or 10000 elements!↩︎
Note that to print your result to console inside a for-loop, you explicitly have to use print(), unlike what you’ve done so far. Something like
```
for (element in 1:10){
element
}
```
won’t actually print anything.↩︎
but it’s kind of dumb, don’t do it↩︎
Something to think about: Why do we start the vector we’re looping over at 2, not 1? what would happen if we started at 1? (hint: what is 1 - 1, and does that correspond to an element in our vector?)↩︎