Many statistical procedures require you to find the sum of a column of numbers as part of your calculations. Fortunately, there is a built in function (the colsums function) to find the column sum in R. This reduces the task to a single line of code using a single function in the base R language. Which is especially helpful when you end up calculating multiple column sums within an R data frame.
Colsums Function.
Doing colsums in R involves using the colsums function, which has the form of colSums(dataset) and returns the sum of the columns in the data set. This sum function also has several optional parameters, one of which is the logical parameter of na.rm that tells the function whether to remove missing value observations.
> x = matrix(rep(1:8),6,4)
>
> x
[,1] [,2] [,3] [,4]
[1,] 1 7 5 3
[2,] 2 8 6 4
[3,] 3 1 7 5
[4,] 4 2 8 6
[5,] 5 3 1 7
[6,] 6 4 2 8
>
> colSums(x)
[1] 21 25 29 33
Here is an example of the use of the colsums function. If you add up column 1, you will get 21 just as you get from the colsums function. Along with it, you get the sums of the other three columns. As you can see the default colsums function in r returns the sums of all the columns in the R dataframe and not just a specific column.
Application.
There are numerous applications resuming up a column in a data set. This is commonly done in spreadsheets and other formats. In data science, it can be used to gather the totals from a list of values. Below, we have an example based on arrest rates in each state of the United States.
> head(USArrests)
Murder Assault UrbanPop Rape
Alabama 13.2 236 58 21.2
Alaska 10.0 263 48 44.5
Arizona 8.1 294 80 31.0
Arkansas 8.8 190 50 19.5
California 9.0 276 91 40.6
Colorado 7.9 204 78 38.7
>
> colSums(USArrests)
Murder Assault UrbanPop Rape
389.4 8538.0 3277.0 1061.6
Here, we have the first six rows of this data set show me the first six states in alphabetical order. The data set contains arrest rates for murder, assault, urban populations, and rape. After the colSums function is applied, we have a total in all four categories for all 50 states combined.
Potential Errors
There are a couple of potential errors you can throw with this function. For example, the R colsums() function isn’t very tolerant of a missing or non-numeric data element. You can easily generate lovely errors such as…
error in colsums(x, na.rm = true) : ‘x’ must be numeric
Should this lovely fail-whale appear, the cause is simple enough. Check the data you’ve fed into your process to see if you are handing it numeric columns with a proper column name. Something in there isn’t numeric and the colsums function throws a little tantrum to communicate that you. My best suggestion is to filter the missing or incorrect data point from your data and proceed from there.
You may also get:
error in colsums: ‘x’ must be an array of at least two dimensions
Which occurs when you feed a vector (single dimensional series of values) into a function which expects to look at an array.
Related Functions & Broader Usage
There are several functions designed to help you calculate the total column value and average value of columns and rows in R. In addition to rowmeans in r, this family of functions includes colmeans, rowsum, and colsum. Here’s some specifics on where you use them…
- Colmeans – calculate mean of multiple columns in r .
- Colsums – how do i sum each column in r…
- Rowsums – sum specific rows in r
These functions are extremely useful when you’re doing advanced matrix manipulation or implementing a statistical function in R. These form the building blocks of many basic statistical operations and linear algebra procedures. This is why you sometimes see an error message from this cluster of functions show up as part of a higher level package.
In the event you need them, there are also functions for RowMedians (solves for the median of a row in R) and RowSD (solves for the standard deviation of a row in R). Given the existence of the above, be sure to do a quick search of the various R packages if you need anything more exotic – since it most likely exists…
If you are looking to solve for rowmeans or rowsums by group, check out the aggregate function (one of the items we addressed in our article about descriptive statistics).
Related Content: