Getting Loopy - Loops in R (How to Use For, While, Apply & Friends)

Ask anyone involved with data science what their most valuable tool is and there’s a fair chance that they’ll bring up R programming. There are a number of reasons why R is so popular among people working with data science, statistics, and machine learning. But one of the most important stems from the fact that R provides some powerful techniques to work with data sets. One of the most interesting aspects of these techniques is that you’ll typically find elements of them in other programming languages. However, R rethinks common assumptions to make its techniques more useful within the context of data analysis.

Redesigned functionality that focuses on data science can be found in many elements related to R. But it’s perhaps most easily seen with R loops. How do loops in R differ from other programming languages? And how can you get the most out of loops in R? Read on to discover how to use standard for and while loops in R. And then prepare yourself to move on to a whole new level of loopiness with R’s apply function.

An Introduction to Loops

As the name suggests, a loop essentially loops through a data set. This might constitute an endless process. For example, loops might be used as part of a continually running program that checks equipment output on a regular basis. But in R it’s more common for loops to be paired with finite data sets.

A programmer typically sets up a loop to work through data sets and grab a vector or perform a calculation on it. An internal tally will usually be checked with every iteration to see if the process should be terminated. And that iterator tally can itself be leveraged within the loop’s internal logic. If you’re familiar with loops in other programming languages then you’ll recognize some core elements of R’s implementation. And all of this can be easily demonstrated by looking at some code samples. Try running the following code.

ourDataFrame <- data.frame(
alpha = c (111,80,70,60,50,40,30,20,10,101010),
beta = c(222,191,192,193,194,195,196,197,198,2020),
gamma = c(333,322,323,324,325,326,327,328,329,3333))

ourRowLength <-nrow(ourDataFrame)
ourRowPosition <- 1
while(ourRowPosition <=ourRowLength-1)
{
print(ourDataFrame[ourRowPosition , 1])
ourRowPosition <- ourRowPosition + 1
}

One of the most important elements of R’s loops can be seen in the first line. We begin by creating a data frame called ourDataFrame and filling it with rows and columns. We’re intentionally making the numbers distinct so that it’s easy to recognize their position within the data frame during the loop’s progression.

The fact that we’re using a data frame is extremely important since it’s one of R’s unique variable types. As such it’s notable that we don’t need to perform any special procedures on the data frame in order to iterate over it. Instead, we proceed to assign the number of rows in the frame to ourRowLength. Next, we create a variable to store our iteration during the process. We proceed to use a while loop as we move over the contents of ourDataFrame.

The loop body of while loops is dependent on conditional statements. In this case, we continue looping until the ourRowPosition is one less than the value stored in ourRowLength. This technique can also be described as an example of control statements since we’re controlling the process with conditional logic.

With each step, we increase the value of ourRowPosition and print out a value specified as ourDataFrame[ourRowPosition , 1]. The logical conditions in this example mean that we’re specifically accessing the alpha row since we’re using 1 to indicate print’s read position on that axis. We can also create a nested loop to access the totality of the data frame. Try replacing the previous while loop with the following code.

for (a in 1:nrow(ourDataFrame)){
for (b in 1:ncol(ourDataFrame)){
print(paste(‘column’, a, ‘row’,b,’=’, ourDataFrame[a,b]))
}
}

In this example, we create a nested loop that moves over the item count of both rows and columns between the inner loop and outer loop. This is somewhat similar to a foreach loop function in some other programming languages. Note again how easy it is to work with R’s data types. The same is generally true when we use other native types like a matrix. But this also leads to the apply function family and its relationship with these data types.

More Advanced Looping Techniques

There are some ways we could make the previous loops a little more concise. But the best method comes from forgoing manual loops in order to leverage R’s built-in features through the apply function and its variants. These consist of the main apply function, lapply, sapply, and tapply.

Each has specialized functionality that’s more or less consistent with its name. For example, lapply leverages the apply function with lists. Sapply provides simplified applies. And the pattern continues as you’d expect. The function family might work with and return different data types. But they all provide one amazing piece of functionality.

Apply’s varients can all run a function using elements from within a data collection without needing to manually form loops. Apply’s syntax essentially assumes a loop will be used and sets everything up for you. The concept might seem complex. But the implementation is surprisingly simple. Take a look at the following code.

ourDataFrame <- data.frame(
alpha = c (111,80,70,60,50,40,30,20,10,101010),
beta = c(222,191,192,193,194,195,196,197,198,2020),
gamma = c(333,322,323,324,325,326,327,328,329,3333))

ourFunction <- function(x)
{
print(paste(‘The modified var =’, x+1))
}

apply(ourDataFrame, c(1, 2), ourFunction)

We begin by defining our frame and its familiar data under ourDataFrame. Next, we create a function called ourFunction. The ourFunction function takes a single parameter and prints out the value with 1 added to it. Our function is simple by design as we’re using it to show how writing functions can add functionality to apply’s loops.

In the next line, we use apply’s functionality on ourDataFrame by passing it as the first argument. The second argument is c(1,2). A value of 1 will work over rows. A value of 2 will work over columns. And by passing c(1,2) we specify that we want to work with both rows and columns. Doing so essentially replicates the nested loop from our earlier example.

To summarize, apply’s parameter assessment means that we’re going to loop through both the rows and columns of ourDataFrame. Each element is passed to ourFunction where we print out the result of a simple addition operation. When we run the code it prints out “The modified var =” as it loops over every item in ourDataFrame.

This basic syntax is generally consistent with apply’s larger family of functions. Of course, each variant has unique quirks and benefits as well. But mastering apply’s concept and syntax will provide you with the skills needed to work with the full range of R’s looping capabilities.

Our Comprehensive Guide To Loops in R

How To Use the apply function (matrix or data frame)
How To Use the sapply function (simplified version of lapply)
How To Use the lapply function (list or vector)
How To Use the mapply function (applying a function to multiple lists or vectors)
How To Use the tapply function (levels of a factor)
While Loops
For Loops
Creating Anonymous Functions in R