Every popular programming language has advantages that set it apart from its peers. And languages like R take that a step further by focusing on a specific subject. In the case of R that means excelling in areas related to statistics and data science. In short, R is at its best when you’re juggling large amounts of numerical data. The language makes it easy to assign, unpack, and manipulate these collections. For example, take the question of how you’d assign and unpack results from a function that returns multiple values. This can be a rather convoluted process in many other programming languages. And R’s take on this idea might seem complex in its own way at first glance. But as you’ll soon see, working with multiple variables is relatively easy with R’s syntax. At least once we clarify a few points.
Functions, Data, and Returns
If you’re coming to R from another programming language then you might assume that R’s returns would be fairly straightforward. You’ll find a fairly similar order of events in most programming languages. Variables are passed to functions. The functions then process those variables. And, finally, the variables are returned through some kind of explicit command at the end of the functions.
There are two points in R that can somewhat complicate this otherwise straightforward process. One of the biggest issues is that with most programming languages things are either formatted correctly for a return or they’re not. If you don’t use a return statement, or analogous function, most languages won’t return any data after a function’s called. And in some languages, you might even see the program flat-out crash. But in R your data may or may not wind up returned to your initial call if you don’t use a return statement. The concept might seem a little strange. After all, programming languages generally insist on strict standards. Both for their internal data management and how we format our code. But you’ll soon see that this behavior isn’t as random as it might seem.
The second point that needs clarification is how data is handed off to functions and how it’s returned. R’s return function explicitly returns a single point of data. As such, there’s a pressing question of how to send multiple points of data through a single return statement.
Stepping Into Practical Data Management
Both of the major points we’ve looked at are easier to understand by looking at practical examples. We can begin by creating a simple example of how functions and variables work together.
ourSingleFunction <- function(ourSingleValue) {
x <- ourSingleValue * 2
return(x)
}
ourSingleVector <- ourSingleFunction(2)
print(ourSingleVector)
This example begins with a user defined function called ourSingleFunction. As the name suggests we simply pass one value to it. This item is then multiplied by 2 and assigned to x before being sent back through a return command. We call this process by supplying an argument of 2, and pass on the result to ourSingleVector. And, finally, we print out the result. The final total probably isn’t very surprising as this is a fairly straightforward process. However, your expectations might be subverted if you simply remove the return at the end of ourSingleFunction.
When the return is present we receive our answer of 4 just as we’d expect. But if we remove the return, we still get an answer of 4. Some programming languages are error resistant by design. But R’s robust handling of this situation will probably seem strange to people coming to it from other languages.
R’s interpreter simply returns the last output of functions by default if there’s no return statement. This can save us from making simple user-created mistakes. But at the same time, it also opens up the door to interpreter-created mistakes. For example, if we were working with more data then simply changing the order of operations would result in a different return value at the function’s end. As such, we should always use a return statement. Not necessarily because we have to. But because not doing so just creates too many opportunities for errors to creep in.
But what about the second point we’d considered? How does return handle single and multiple values in a return statement? Restore our previous example to its initial state, but change the return statement to the following line.
return(x,x)
You might assume that this would simply send two instances of our x variable, defined as 4, back to the initial call. But you’ll instead see a complaint that “multi-argument returns are not permitted”. A statistics and data-focused language will obviously need to work with a large number of variables. So this might seem a little strange. But in reality, this is more of an example of just how focused the language is on larger values. You typically won’t send two or three integers back from a function. Variables in R tend to be collections of data rather than singular entities. It’s far more common to send a large batch of values as a vector all at once and then select items from within it to work with. But with that in mind, how exactly do we work with more than one variable on either side of functions?
Juggling and Processing Multiple Data Points
Once again we can best see how the language works by seeing it in action. Try running the following code.
ourMultiFunction <- function(ourFirstArgument, ourSecondArgument) {
x <- ourFirstArgument * 1
y <- ourSecondArgument * 2
z <- ourFirstArgument * 3
ourReturnValues <- list(x, y, z)
return(ourReturnValues)
}
ourMultiVector <- ourMultiFunction(2,9)
print(ourMultiVector)
We begin in a similar way to the previous example. But this time around the function’s setup expects two arguments – ourFirstArgument and ourSecondArgument. We multiply the ourFirstArgument by 1 and 3 and assign the results to x and z. The ourSecondArgument value is multiplied by 2 and assigned to y. Next, we create a list containing these new variables and assign it to ourReturnValues. This new list is then passed back within a return statement. This is how we get around the problem of multi-argument returns. We’re sending multiple values through return. But those values are all contained within a single container vector.
Our main code block begins by calling ourMultiFunction with an argument of 2 and 9. Note that we’re using multiple arguments without needing to specially format our data for the task. We can’t return multiple values, but we can use multiple arguments when calling functions.
Now that we’re supplying a list in ourMultiFunction the returned value succeeds in returning multiple values for ourMultiVector. And, finally, we print that value to screen. Note that this works with any vector that contains multiple elements. For example, try replacing the ourReturnValues list assignment with the following line.
ourReturnValues <- data.frame(x, y, z)
When you run the code you’ll see that ourMultiVector is now a data frame. The user defined function defines how the data will be formatted so there’s no need to modify the ourMultiVector variable.
Of course, there are also situations where you want individual values rather than a collection containing them. What if we simply needed to see each individual value returned from ourMultiFunction? We could do so by simply using a loop. Try adding the following code underneath print(ourMultiVector) in our prior example.
for (ourIndividualItem in ourMultiVector) {
print(ourIndividualItem)
print(typeof(ourIndividualItem))
}
We begin by creating a for loop that will work through the contents of ourMultiVector. This loop will essentially step through every individual item contained within ourMultiVector and open it up for further action. We’re just printing out the value of each item and its type in this example. But the loop is essentially giving us the chance to perform any needed action on every item contained in the returned vector. Using a for loop also makes it easier to add conditional elements. For example, we might discard values that are larger or smaller than a specified value.
All of this highlights the fact that R’s syntax and lexicon might be a little different from what you’ll find in languages that are less focused on statistics and data science. But those differences are there for a good reason. When you’re initially forced to use a singular rather than multi-argument return it can feel limiting. But in reality, the assumption of a singular format is pushing us to use a coding style more in line with the larger sets found in data science as a whole.
R’s essentially encouraging us to future-proof our code for situations where we’re dealing with larger data sets. Forcing a singular format for returns can feel a little pointless when we’re just trying to send back three values. But it’d be quite a different situation if those three values grew exponentially and our return suddenly used three thousand individual instances.