Sum in R! Tally Ho! Sum Saves Lives!

We’re going to profile one of my favorite – and most used – functions in the entire base R package. Meet the Sum() function.

You’re probably thinking this is sarcasm. Sure, Sum() is used everywhere. We’re talking about a ten thousand year old mathematical operator, that has been reduced to a function and passed around the entire R package. This is much deeper than R, however.

The Real value of Sum in R

Are you ready for why learning Sum() will make you an infinitely better analyst? Why mastering Sum() in R is a step to becoming intern of the year, the hero of your summer associate class, the first director of your year? Nay! A Bold LEAP in the general direction of analytics greatness!

Because it gives you an easy way to check your work!

And down that road…. lies a path to eminence!

Basic Syntax of the Sum() Function

Sum(x, na.rm = FALSE)

  • x – a list or vector of values
  • na.rm – the option of removing missing values; otherwise, it bombs if you’ve got them
# sum in R example
> sum(c(1,2,3,4))
[1] 10

# sum in R - using to fix missing values
> sum(c(1,2,3,4, NA))
[1] NA
> sum(c(1,2,3,4, NA), na.rm=TRUE)
[1] 10

How to Sum A Column in R

The real value of sum, to a typical R user, is checking your work.

Pull a query from the database? Check the column totals, ideally against a third party source (such as your financials or production report, for those of you who are working in industry). Compare the sum for a column with past reports that you’ve ran. Sum it all. Compare it all.

Seriously – this is my number one complaint with junior analysts. Check. Your. Work. It takes 5 seconds to run a sum total. You can even script it.

Don’t make me figure out that the sales database didn’t update and we’re looking at an old data set. You should have seen that within 5 minutes of starting your QA process after pulling the data. Same total as last month? For a multi-million dollar sales ledger? Nope. It’s wrong. Sum saves lives.

You’re probably looking for syntax. See below.

# sum in R - tally a column
> sum(mydata$mycolumn)
[1] 42