Most programming languages have a number of common elements. These are the properties that are relatively similar across everything from C to Ruby. For example, the we define an integer can differ across various languages. But an actual addition operation using those integers is generally the same in the vast majority of languages.
It’s the elements that aren’t shared that typically define a language. For example, how functions can be called. Mastering these elements unlocks the truly unique power contained in any programming language. And with R, one of the more important examples of this concept is found in a do call. Do.call is an elegantly designed and easy-to-use way of applying functions once you’ve learned about some of its main points. But to do so, we first need to properly define the do call function.
What Is do.call?
R’s do.call essentially takes lists and passes them to a called function. This makes it easier to work with large amounts of variables. We can use do.call to pass every variable as a whole instead of manually separating them. The process sends all relevant argument values to a function as lists. Though the utility of this approach is easier to see with some actual code samples.
Basic Use of do.call
We can begin with a simple example to see just how do.call interacts with functions and variables. Take a look at the following code.
ourData <- list(5,10,15,20,25)
print(sum(ourData))
We begin by defining a collection of numbers and assigning it to ourData. We then pass it to sum and attempt to print out the result on the screen. However, the R interpreter will tell us that we’re using an improper type as an argument for sum. And this is where do.call comes in. Take a look at this simple example.
ourData <- list(5,10,15,20,25)
ourResult <- do.call(sum, ourData)
print(ourResult)
As you can see, this is quite similar to the initial code sample. We create the same collection of numbers and assign it to ourData. But when we define ourResult we use do.call to run the sum function. The do.call function call uses sum as the argument. You can also change the called function to achieve different results. For example, try changing sum to mean and you’ll see the appropriate result from the new R function.
But the most significant change is what happens when we try to print the result to the screen. The previous code example exited with an error before it was able to print anything to the screen. But this time around the sum’s able to run properly and the results are applied to ourResult. This means that print is able to properly execute and show us the summed value of ourResult. Ease of variable management is one of the key benefits of do.call. But the following example highlights another benefit.
ourFunction <- function() {
listData <- list(5,10,15,20,25)
print(sum(listData))
}
#print(ourFunction())
print(“done”)
In this example we see the previously error-producing code moved into a function called ourFunction. The call to it is commented out though. If you run this script you’ll simply see “done” printed on to screen. You might be surprised to find that the code in ourFunction didn’t produce an error. This is because R uses lazy evaluation. The interpreter doesn’t evaluate code unless it’s actually used within the script. This is unlike most compiled languages where any errors or possible issues are pointed out during compilation.
Since we never call ourFunction, the interpreter has no reason to evaluate its contents. If you uncomment the call to ourFunction you’ll see the expected error message. This is relevant to do.call because it highlights how easily incompatible data types can get into your code. Anything which reduces the potential for incompatible argument evaluation or incompatible args, in general, is a significant boon.
These are obviously fairly simple examples. But they demonstrate the basics of how to use do.call. But with that in mind, let’s take a look at something more advanced that reflects the power of using do.call to pass lists, apply function processes to them, and return the results.
Building on What You’ve Seen So Far
In the following example we’ll put ourselves in the shoes of someone who needs to evaluate data from multiple regions. Instead of handling things individually we can use do.call on rbind to join every data frame together.
ourData1 <- data.frame(Region=c(‘One’, ‘Two’, ‘Three’,’Four’),
Change=c(5, 10, 15, 20))
ourData2 <- data.frame(Region=c(‘Five’, ‘Six’, ‘Seven’, ‘Eight’),
Change=c(25, 30, 35, 40))
ourData3 <- data.frame(Region=c(‘Nine’, ‘Ten’, ‘Eleven’, ‘Twelve’),
Change=c(45, 50, 55, 60))
ourData4 <- data.frame(Region=c(‘Thirteen’, ‘Fourteen’, ‘Fifteen’, ‘Twenty’),
Change=c(65, 70, 75, 80))
ourList <- list(ourData1, ourData2, ourData3, ourData4)
ourResult <- do.call(rbind, ourList)
print(ourResult)
We begin by creating four ourData data frames. Each consists of four regions and four changes. All of these frames are then assigned to ourList as a vector of the lists type. Now that all of our data is set up in a collection we can use it as the argument name to pass within a do.call. The function sends ourList data to rbind. And rbind, in turn, assigns the returned data to ourResult.
Now try changing the ourResult assignment to the following.
ourResult <- rbind (ourList)
This one change shows how significant a difference do.call can make in the larger scope of argument evaluation. When we pass the argument name to R bind we’re essentially telling it to act on the list itself rather than the contents of the list. Whereas with do.call we’re working with the contents of the list rather than the list as a distinct entity.
How arguments are passed to functions can seem like a subtle point at first glance. But as you can see, it has larger-scale consequences. And it’s often an effect that can go unnoticed in your code as it won’t always raise an error. If you think back to our earlier examples you’ll recall that we touched on the subject of lazy evaluation.
The issue of argument interpretation factors in with the current example in a similar way. R does a good job of preventing operations on incompatible data types. But there are issues related to the interpreter design which can let problems with data incompatibility slip through the cracks. Functions like do.call can help ensure code remains resistant to those types of issues.