4.1 R programming: Making custom functions
4.1.1 Motivation
In last week’s tutorial and assignment, you used a for-loop to simulate genetic drift. If you did the assignment, you are probably tired of seeing code-snippets looking like this or similar:
N <- 10
ngen <- 1000
p_init <- 0.5
n10 <- rep(NA, ngen)
n10[1] <- p_init
for (i in 2:ngen){
nA1 <- rbinom(1, 2*N, n10[i-1])
n10[i] <- nA1 / (2*N)
}
Although the for-loop saved you thousands of lines of code compared to writing out the simulation manually, you still had to copy and paste the for loop itself many times. You had to write it 3 times for the tutorial and another 3 times for the assignment, just changing a single number each time! You may have thought that there has to be a better way of doing this, and there is! You can make this simpler by making your own custom functions, which you will learn about in the R-part of this tutorial.
4.1.2 Function basics
You have already used a lot of functions in R, with names like mean()
, sum()
and ggplot()
. Common for these is that they take some input—arguments—does something with that input and returns the result. For making your own function, you need three things:
- A name for your function. This could be anything, but it’s best to give them a sensible name.
- The arguments to your function. These are the values that you want to be able to change each time you run the function.
- What the function does with the arguments and what it returns. This is called the “body” of the function.
A basic custom function typically looks like this:
function_name <- function(argument){
#the body of the function is inside curly brackets
#here you do something with the arguments, for example:
result <- argument^2
# functions typically end with returning some value
return(result)
}
The above function will take an input value, square it, and return the result. You can now use the function by writing function_name(somevalue)
, like you would with the functions you’ve encountered so far. The arguments go into the parentheses of function()
, and the body is inside curly brackets. The value to return is wrapped inside return()
Note that the name of the function and arguments can be whatever you’d like. You can also have as many arguments as you want, separated by comma.
Important concept:
Custom functions makes a piece of code reusable, either for doing something many times, or for using for other purposes later. A function takes one or more arguments, does something with the arguments in the body and returns the result.
4.1.3 A simple example
Let’s make a simple function that allows you to see clearly what the function is doing. The following function takes an argument, and prints it in a sentence:
print_arg <- function(argument){
# make a string explaining what the argument is
sentence <- paste("the argument is:", argument)
# return the string
return(sentence)
}
When you run this code, nothing is printed, but the function has been saved. You can now call your new function with any argument.
# you can store the returned value to an object and print it
horse_sentence <- print_arg("horse")
horse_sentence
#> [1] "the argument is: horse"
# you can put any expression as the argument,
# for example a logical statement
print_arg(pi > exp(1))
#> [1] "the argument is: TRUE"
You can also reference the argument by name when calling the function, like you learned about in week 1.
While this particular function is quite useless, notice how much typing you have saved already. We will move on to some more useful functions shortly, after an exercise.
Exercise: create a new function called print_args()
that takes two arguments as input, and returns the text “the first argument is [argument1] and the second is [argument2]”.
4.1.4 Some important function properties
One important thing about functions is that any objects you create only exist inside the function. This means that even though the above function creates an object named sentence
, and we’ve run the function many times, there is no object called sentence in your R session:
This means that you don’t need to worry about naming conflicts when running a function, which is convenient. However, it also means that you can’t access any objects that are made inside your function, only whatever you put inside return()
Another important thing is that you can only return 1 thing from a function. This means that if you want to return multiple objects, you need to make these into a vector. In the following three examples, only the last function will work as intended:
return_2 <- function(x, y){
return(x, y) # will produce an error, you can only return a single object!
}
return_2(2, 4)
#> Error in return(x, y): multi-argument returns are not permitted
return_twice <- function(x, y){
return(x) # x is returned, and the function stops
return(y) # the function stops before this line, so y is lost forever
}
return_twice(2, 4)
#> [1] 2
return_vector <- function(x, y){
return_vec <- c(x, y)
return(return_vec) # this is OK, only one object is returned
}
return_vector(2, 4)
#> [1] 2 4
Important concept:
Any variables you create inside a function only exists within that function.
If you want to return more than 1 value, you need to bind the values together in e.g. a vector.
4.1.5 A slightly more useful example
A more useful function is often one that does some calculations and returns the result. Here’s an example of a function that takes two arguments and divides the first by the other.
divide <- function(numerator, denominator){
result <- numerator/denominator
return(result)
}
divide(5, 3)
#> [1] 1.666667
Exercise: create a function that takes two arguments and returns the sum of those arguments. Tip: call it something other than sum
(e.g. add
) to avoid overwriting R’s built in sum
function.
4.1.6 Example: calculating genotype frequencies
In last week’s tutorial, you used the following snippet of code to calculate genotype frequencies from allele frequencies:
# first we set the frequencies
p <- 0.8
q <- 1 - p
# calculate the expected genotype frequencies (_e denotes expected)
A1A1_e <- p^2
A1A2_e <- 2 * (p * q)
A2A2_e <- q^2
# show the genotype frequencies in the console
c(A1A1_e, A1A2_e, A2A2_e)
#> [1] 0.64 0.32 0.04
This is something I imagine you will do often in this course and later in life if you pursue evolutionary biology, so it would make sense to make it into a function. I will let you try to do it on your own before showing you how to do this. Use the hints if you get stuck!
Exercise: create a function that takes a single argument, p
, and returns the expected genotype frequencies according to the Hardy-Weinberg expectation.
Show hint
Start by wrapping the whole code snippet above inside the curly brackets of a function.
Show another hint
Remember that you can only return 1 object, so you need to make the genotype frequencies into a vector. Also, remember that you should be able to change p, so you should not have the line p <- 0.8
in the body of the function.
When testing your function, it should show the following output:
Congratulations, you have now made your first truly useful function! Any time you need to calculate genotype frequencies from allele frequencies from now on, it will be a breeze!
Important concept:
Making a function isn’t all that hard when you get used to it. A general recipe for converting code into a function is:
- Paste your code inside the function body.
- Make any objects you want to be able to change into an argument of the function (
p
in this example). Remember to remove those objects from the body of the function to avoid overwriting your arguments. - Wrap your result in
return()
.
4.1.7 Creating a function of the drift simulation
Now we’re ready to make a function to simulate drift. Just to remind you, this is the code we used last week for the simulation:
# set population size
N <- 8
# set number of generations
ngen <- 100
# set initial frequency of A1
p_init <- 0.5
# create vector for storing results
p <- rep(NA, ngen)
# set first element to initial p
p[1] <- p_init
for (i in 2:ngen){
# sample number of A1 alleles based on p in previous generation
nA1 <- rbinom(1, 2*N, p[i-1])
# set frequency of A1 as p in the current generation
p[i] <- nA1 / (2*N)
}
p
Notice that there are three parameters we will probably want to be able to change for each simulation: N
, ngen
and p_init
. It makes sense, then, to make these into function arguments.
Exercise: Make the above drift simulation into a function. It should take three arguments: N
, ngen
and p_init
, and return a vector of p’s for each generation.
Show hint
Use the three steps from the previous section: paste the code into the body, turn N
, ngen
and p_init
into arguments and return your result.
drift_sim <- function(N, ngen, p_init){
# create vector for storing results
p <- rep(NA, ngen)
# set first element to initial p
p[1] <- p_init
for (i in 2:ngen){
# sample number of A1 alleles based on p in previous generation
nA1 <- rbinom(1, 2*N, p[i-1])
# set frequency of A1 as p in the current generation
p[i] <- nA1 / (2*N)
}
return(p)
}
That concludes the R-part of this tutorial! The rest of the tutorial will be about fitness and selection, but will make use of functions for exploring this. Remember to check back here if you become unsure about how functions work!
Exercise: when working through the rest of the tutorial, think once in a while: “Will I need to do this many times? Could I make this code into a function? How would I do that?”