4.1 R programming: Making custom functions

4.1.1 Motivation

In last week’s tutorial and assignment, you used a for-loop to simulate genetic drift. If you did the assignment, you are probably tired of seeing code-snippets looking like this or similar:

N <- 10
ngen <- 1000
p_init <- 0.5

n10 <- rep(NA, ngen)
n10[1] <- p_init

for (i in 2:ngen){
  nA1 <- rbinom(1, 2*N, n10[i-1])
  n10[i] <- nA1 / (2*N)
}

Although the for-loop saved you thousands of lines of code compared to writing out the simulation manually, you still had to copy and paste the for loop itself many times. You had to write it 3 times for the tutorial and another 3 times for the assignment, just changing a single number each time! You may have thought that there has to be a better way of doing this, and there is! You can make this simpler by making your own custom functions, which you will learn about in the R-part of this tutorial.

4.1.2 Function basics

You have already used a lot of functions in R, with names like mean(), sum() and ggplot(). Common for these is that they take some input—arguments—does something with that input and returns the result. For making your own function, you need three things:

  • A name for your function. This could be anything, but it’s best to give them a sensible name.
  • The arguments to your function. These are the values that you want to be able to change each time you run the function.
  • What the function does with the arguments and what it returns. This is called the “body” of the function.

A basic custom function typically looks like this:


function_name <- function(argument){
  #the body of the function is inside curly brackets
  #here you do something with the arguments, for example:
  result <- argument^2 
  
  # functions typically end with returning some value
  return(result)
}

The above function will take an input value, square it, and return the result. You can now use the function by writing function_name(somevalue), like you would with the functions you’ve encountered so far. The arguments go into the parentheses of function(), and the body is inside curly brackets. The value to return is wrapped inside return() Note that the name of the function and arguments can be whatever you’d like. You can also have as many arguments as you want, separated by comma.

Important concept:
Custom functions makes a piece of code reusable, either for doing something many times, or for using for other purposes later. A function takes one or more arguments, does something with the arguments in the body and returns the result.

4.1.3 A simple example

Let’s make a simple function that allows you to see clearly what the function is doing. The following function takes an argument, and prints it in a sentence:

print_arg <- function(argument){
  
  # make a string explaining what the argument is
  sentence <- paste("the argument is:", argument)
  
  # return the string
  return(sentence)
}

When you run this code, nothing is printed, but the function has been saved. You can now call your new function with any argument.

print_arg(5)
#> [1] "the argument is: 5"

# you can store the returned value to an object and print it
horse_sentence <- print_arg("horse")
horse_sentence
#> [1] "the argument is: horse"

# you can put any expression as the argument,
# for example a logical statement
print_arg(pi > exp(1))
#> [1] "the argument is: TRUE"

You can also reference the argument by name when calling the function, like you learned about in week 1.

print_arg(argument = "zebra")
#> [1] "the argument is: zebra"

While this particular function is quite useless, notice how much typing you have saved already. We will move on to some more useful functions shortly, after an exercise.

Exercise: create a new function called print_args() that takes two arguments as input, and returns the text “the first argument is [argument1] and the second is [argument2]”.

print_args <- function(argument1, argument2){
  # make a string
  sentence <- paste("the first argument is:", argument1, "and the second is", argument2)
  # return the string
  return(sentence)
}

# test the code

print_args(5, "horse")
#> [1] "the first argument is: 5 and the second is horse"

4.1.4 Some important function properties

One important thing about functions is that any objects you create only exist inside the function. This means that even though the above function creates an object named sentence, and we’ve run the function many times, there is no object called sentence in your R session:

sentence
#> Error in eval(expr, envir, enclos): object 'sentence' not found

This means that you don’t need to worry about naming conflicts when running a function, which is convenient. However, it also means that you can’t access any objects that are made inside your function, only whatever you put inside return()

Another important thing is that you can only return 1 thing from a function. This means that if you want to return multiple objects, you need to make these into a vector. In the following three examples, only the last function will work as intended:

return_2 <- function(x, y){
  return(x, y) # will produce an error, you can only return a single object!
}
return_2(2, 4)
#> Error in return(x, y): multi-argument returns are not permitted

return_twice <- function(x, y){
  return(x) # x is returned, and the function stops
  return(y) # the function stops before this line, so y is lost forever
}
return_twice(2, 4)
#> [1] 2

return_vector <- function(x, y){
  return_vec <- c(x, y)
  return(return_vec) # this is OK, only one object is returned
}
return_vector(2, 4)
#> [1] 2 4

Important concept:

  • Any variables you create inside a function only exists within that function.

  • If you want to return more than 1 value, you need to bind the values together in e.g. a vector.

4.1.5 A slightly more useful example

A more useful function is often one that does some calculations and returns the result. Here’s an example of a function that takes two arguments and divides the first by the other.

divide <- function(numerator, denominator){
  result <- numerator/denominator
  return(result)
}

divide(5, 3)
#> [1] 1.666667
divide(10, 0)
#> [1] Inf
# oops

Exercise: create a function that takes two arguments and returns the sum of those arguments. Tip: call it something other than sum (e.g. add) to avoid overwriting R’s built in sum function.

add <- function(x, y){
  result <- x + y
  return(result)
}

4.1.6 Example: calculating genotype frequencies

In last week’s tutorial, you used the following snippet of code to calculate genotype frequencies from allele frequencies:

# first we set the frequencies
p <- 0.8
q <- 1 - p 

# calculate the expected genotype frequencies (_e denotes expected)
A1A1_e <- p^2
A1A2_e <- 2 * (p * q)
A2A2_e <- q^2
# show the genotype frequencies in the console
c(A1A1_e, A1A2_e, A2A2_e)
#> [1] 0.64 0.32 0.04

This is something I imagine you will do often in this course and later in life if you pursue evolutionary biology, so it would make sense to make it into a function. I will let you try to do it on your own before showing you how to do this. Use the hints if you get stuck!

Exercise: create a function that takes a single argument, p, and returns the expected genotype frequencies according to the Hardy-Weinberg expectation.

Show hint

Start by wrapping the whole code snippet above inside the curly brackets of a function.

Show another hint

Remember that you can only return 1 object, so you need to make the genotype frequencies into a vector. Also, remember that you should be able to change p, so you should not have the line p <- 0.8 in the body of the function.

calc_geno <- function(p){
  
  # calculate q from p
  q <- 1 - p 
  
  # calculate the expected genotype frequencies (_e denotes expected)
  A1A1_e <- p^2
  A1A2_e <- 2 * (p * q)
  A2A2_e <- q^2
  
  # return the genotype frequencies
  geno_freq <- c(A1A1_e, A1A2_e, A2A2_e)
  return(geno_freq)
}

When testing your function, it should show the following output:

calc_geno(0.2)
#> [1] 0.04 0.32 0.64
calc_geno(0.5)
#> [1] 0.25 0.50 0.25
calc_geno(0.7)
#> [1] 0.49 0.42 0.09

Congratulations, you have now made your first truly useful function! Any time you need to calculate genotype frequencies from allele frequencies from now on, it will be a breeze!

Important concept:
Making a function isn’t all that hard when you get used to it. A general recipe for converting code into a function is:

  1. Paste your code inside the function body.
  2. Make any objects you want to be able to change into an argument of the function (p in this example). Remember to remove those objects from the body of the function to avoid overwriting your arguments.
  3. Wrap your result in return().

4.1.7 Creating a function of the drift simulation

Now we’re ready to make a function to simulate drift. Just to remind you, this is the code we used last week for the simulation:

# set population size
N <- 8

# set number of generations
ngen <- 100

# set initial frequency of A1
p_init <- 0.5

# create vector for storing results
p <- rep(NA, ngen)
# set first element to initial p
p[1] <- p_init


for (i in 2:ngen){
  # sample number of A1 alleles based on p in previous generation
  nA1 <- rbinom(1, 2*N, p[i-1])
  
  # set frequency of A1 as p in the current generation
  p[i] <- nA1 / (2*N)
}

p

Notice that there are three parameters we will probably want to be able to change for each simulation: N, ngen and p_init. It makes sense, then, to make these into function arguments.

Exercise: Make the above drift simulation into a function. It should take three arguments: N, ngen and p_init, and return a vector of p’s for each generation.

Show hint

Use the three steps from the previous section: paste the code into the body, turn N, ngen and p_init into arguments and return your result.

drift_sim <- function(N, ngen, p_init){

  # create vector for storing results
  p <- rep(NA, ngen)
  # set first element to initial p
  p[1] <- p_init
  
  
  for (i in 2:ngen){
    # sample number of A1 alleles based on p in previous generation
    nA1 <- rbinom(1, 2*N, p[i-1])
    
    # set frequency of A1 as p in the current generation
    p[i] <- nA1 / (2*N)
  }
  
  return(p)
}

That concludes the R-part of this tutorial! The rest of the tutorial will be about fitness and selection, but will make use of functions for exploring this. Remember to check back here if you become unsure about how functions work!

Exercise: when working through the rest of the tutorial, think once in a while: “Will I need to do this many times? Could I make this code into a function? How would I do that?”