There’s little doubt that Excel provides a wealth of easy-to-use functionality. However, it’s equally clear that Excel isn’t the most robust platform for data science. It’s a good place to start out with basic details or tracking. But as a project grows people will typically find themselves needing to transfer their methodology and data sets to a more robust platform such as R.

Moving from Excel to R is generally a fairly straightforward process since most of Excel’s functionality can be seen as a subset of full data science platforms. However, actually finding analogous functionality to Microsoft’s proprietary implementations can sometimes be a little tricky. For example, how would we go about emulating Excel’s sumif function in R?

## A Quick Look at Sumif

Before moving on to R we need to clarify exactly what sumif does. On the surface, sumif is extremely simple. It basically just returns a sum of the supplied data. This is where the sum in its name comes from. But the “if” is where the function’s true power comes into play. Excel lets you supply selective criteria to decide which data will be summed. For example, imagine that you had a margin of error which meant that you could disregard values under 20. You could easily use sumif to only work with data that meets that criteria. Or you could easily add sums from one column while ignoring another. Likewise, sumif can be used to selectively work with data using almost any given criteria. We could also use sumif to join multiple elements together by adding their contents. Sumif essentially acts like an if conditional that can work with multidimensional collections. As such, it’s fairly easy to emulate this within a full programming language like R.

## Implementing Sumif in R

There’s a variety of different ways to simulate sumif in R. But one R function in particular stands out as a dropin replacement for sumif – aggregate. Take a look at the following example.

df <- data.frame(

Planet = c(“mercury”, “mercury”, “venus”, “venus”),

Landings = c(“landing1”, “landing2”, “landing3”, “landing4”),

Finding1 = c(3, 6, 9, 12),

Finding2 = c(80, 90, 100, 110)

)

print(df)

In this example we’ll assume that we’re working with data from multiple hypothetical landings on the two inner planets from the sun. We’ll need to use an ad-hoc sumif on the data. As you can see from print, the data’s all formatted and ready to use. So replace the print statement with the following.

print(aggregate(cbind(Finding1, Finding2) ~ Planet, data = df, FUN = sum))

This statement is, essentially, emulating sumif. But the mechanisms by which it does this need a little more explanation.

## Simulating Sumif With Aggregation and Cbind in R

In the prior solution, we were able to simulate sumif with just a single line of code. But R can pack a lot of power into a small space. And that’s exactly what’s happening here. We’re really dealing with two separate solutions that come together as an exponential whole. Namely, the aggregate and cbind functions work in tandom to produce the needed result.

The order of operations works from inside out, so we begin by using cbind on Finding1 and Finding2. This combines the two for easier manipulation. This is then sent to the aggregate function. However, note the FUN assignment at the end of our call to aggregate. This is the function that’ll be used for our data. As you’d expect for a sumif replacement, we’re using sum to aggregate the sum value. But keep in mind that you can easily use custom functions with aggregate for more detailed conditional processing.

## Approaching the Problem Through Dplyr

Aggregate is perhaps the most concise way of replicating sumif functionality in R. But it’s hardly the only one. We also have a fairly simple way to accomplish the same goal with a library called dplyr. The dplyr library provides an extensive collection of data manipulation functions that make a sumif emulation fairly easy. Take a look at the following code to see how we could implement sumif using dplyr.

library(dplyr)

df <- data.frame(

Planet = c(“mercury”, “mercury”, “venus”, “venus”),

Landings = c(“landing1”, “landing2”, “landing3”, “landing4”),

Finding1 = c(3, 6, 9, 12),

Finding2 = c(80, 90, 100, 110)

)

df %<%

group_by(Planet) %<%

summarise(across(-Landings , sum))

We begin by using the dplyr library and redefining our data frame. Next, we use dplyr’s group_by and summarise functions. It’s important to remember that these two items come as a pair. When used together they essentially emulate GROUP BY in SQL. We can join this together with a sum to essentially recreate both sumif and the earlier use of aggregate and cbind.

One of the most important takeaways from both is their methodological similarity. The methods used to recreate sumif are different. But they both depend on the fact that we can easily process data within a function and then work with the result. R might require a little more fine-tuned control than many functions within Excel. But at the same time, the ability to chain functionality together means that R gives you far more freedom to work with data however you want. Whether that means replicating sumif or taking that functionality into all new and more powerful directions.