How to Use apply in R to apply a function to a matrix or data frame

The R programming language is one of the most popular options among people working in data science. And one of the reasons for this popularity stems from R’s relationship to data manipulation. The base R language gives you a variety of powerful ways to work with large collections of numbers. Along with this you’ll also find many different data types that let you elegantly manipulate, import, and export information. The apply function and its related components are one of the best examples of how R excels in this area. And you’ll soon see just how easy it is to use apply in R to direct a function to a matrix, data frame, or other vector.

What Are the Apply-Related Functions?

Before looking at solutions which use the apply function we need to first consider exactly why it’s such an important part of the base R language. And this gets to the heart of why R is such a popular choice for research and statistical work.

Almost all programming languages can work with massive amounts of information. But there’s always larger questions to ask at the beginning of a project. It’s important to consider how much work will go into creating functions to manipulate those large data sets. And it’s equally important to consider overall processing speed and efficiency. The absolute ideal is a programming language which will let you quickly write human readable code that will run at an equally impressive speed. The R programming language provides a number of coding shortcuts to make that ideal a reality.

R’s basic syntax is relatively high-level and easy to work with. And R also has methods that can optimize otherwise processor and memory-heavy procedures. One of the key examples of this optimization is found in R’s relationship to its data frame, matrix, list, and similar collections. In many other languages you’d need to manually iterate through every element within a collection. But most of R’s functions can be classified as a vectorized function. This term describes a function that can act on a collection of elements as a whole or in part with no need to manually loop through them.

Vectorization might not seem very important at first. But it’s part of what qualifies R as one of the best of the best for stasticial work. Manually iterating over a huge numerical set can be a computationally intense task. And it also adds a lot of bloat to your code. Vectorized functions reduce the computational strain while also making it easier to design those procedures.

Starting Out With the Basics of Apply

The benefits of apply can be more easily seen by looking at some examples. We can begin by demonstrating how we’d write out a basic addition procedure in a non-vectorized system. Take a look at the following code.

ourDataFrame <- data.frame(
first = c (1,2,3,4,5,6,7,8,9,10),
second = c(11,12,13,14,15,16,17,18,19,20),
third = c(21,22,23,24,25,26,27,28,29,30),
fourth = c(31,32,33,34,35,36,37,38,39,40),
fifth = c(2,2,2,2,2,2,2,2,2,2))

y <- 0
for (x in ourDataFrame$fifth) {
y <- y + x
}
print(y)

In this example we’ll add up all of the numbers in the fifth column of a data frame. We begin by declaring a new frame called ourDataFrame and populate it with five columns. Next, we create the y variable and set it to 0. We proceed into a for loop that specifically works through the fifth column of ourDataFrame. Each iteration adds the value of the current point in our column to the y variable. Finally, we print out the current value of y.

This method of working through a set is obviously functional. But it’s highly unoptimized. And most importantly, it’s not taking advantage of R’s vectorized nature. Now take a look at the following code to see a more concise and optimised approach.

ourDataFrame <- data.frame(
first = c (1,2,3,4,5,6,7,8,9,10),
second = c(11,12,13,14,15,16,17,18,19,20),
third = c(21,22,23,24,25,26,27,28,29,30),
fourth = c(31,32,33,34,35,36,37,38,39,40),
fifth = c(2,2,2,2,2,2,2,2,2,2))
print(apply(ourDataFrame[5], 2, sum))

We begin by defining ourDataFrame in the same way as the previous example. But all of the functionality from the rest of the example’s code is now wrapped up into a single line. The print statement encapsulates an example of how we can use apply. In this case we call apply and pass the fifth element of ourDataFrame. We then supply two additional arguments. These are the number 2 and the word sum.

The number 2 is fed into apply’s margin variable. Margin is just shorthand for whether we’re working with rows, columns, or both. If we supply a 1 then apply will work on rows. If we supply 2 then apply works with columns. Note that margin can be told to work with both rows and columns. But contrary to what you might expect, this doesn’t mean passing the number 3. We instead need to essentially use both 1 and 2. We could do so by passing c(1,2) as our margin argument. But in this case we’re only concerned with the fifth column so we’ll pass 2 as the argument.

Finally, we use sum as the last argument passed to apply. In this case sum refers to R’s sum function. Note that this can be any R function. And that even includes functions that you’ve personally written.

This concept will also work with other R collections. For example, the following code shows how we can adapt this idea to use a matrix instead of a data frame.

ourMatrix <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, 31,32,33,34,35,36,37,38,39,40, 2,2,2,2,2,2,2,2,2,2 ), nrow = 10, ncol = 5)
print( apply(ourMatrix[,5,drop=F], 2, sum) )

We start out by assigning a matrix to ourMatrix that consists of ten rows and five columns. We use the same values as in the previous example. The apply once again calls sum and is wrapped in a print command. Likewise, we’re still using 2 to indicate that we want to work on a column. The main difference simply comes from how we point to a location in a different data type. In this case we’re specifying the fifth column, while also signaling that we want to preserve the matrix structure through drop=F.

In both examples we’re essentially getting the good parts of a for loop without the deficit of performance hits and overly verbose code blocks. However, apply in R is actually just one function in a set of similar options. And we can get even more power out of the language by matching functions to our needs.

Other Variations on Apply

We can use apply on almost any of R’s collection types. However, we can optimize our code by matching a particular data type to the appropriate varient of apply. For example, the earlier code that worked on a matrix could be better paired with the mapply function. Likewise, if we were working with a list we could use the lapply function. Or the tapply function if we needed to work on a sebset of our collections that’s broken down by a supplied value. Take a look at the following code to see this concept in action.

ourDataFrame <- list(
first = c (1,2,3,4,5,6,7,8,9,10),
second = c(11,12,13,14,15,16,17,18,19,20),
third = c(21,22,23,24,25,26,27,28,29,30),
fourth = c(31,32,33,34,35,36,37,38,39,40),
fifth = c(2,2,2,2,2,2,2,2,2,2))

ourLAppliedValue <- lapply(ourDataFrame[5], sum)
print(ourLAppliedValue)
print(typeof(ourLAppliedValue))

ourSAppliedValue <- sapply(ourDataFrame[5], sum)
print(ourSAppliedValue)
print(typeof(ourSAppliedValue))

We start out with the same variables from our initial example. But this time around we define them as a list rather than a data frame. Next, we use the lapply function in the same way that we had previously used apply. We assign the result to ourLAppliedValue and print out its current value and type. Then we repeat the process using the sapply function.

We wind up with the same summed value in both of these cases. But when we use lapply our results are presented in list form because the function defaults to lists. While sapply tries to match output to input. Sapply outputs a double because sum has reduced the list to a number value rather than a collection of numbers.

Need more options? Check Out The articles below…

Scroll to top
Privacy Policy