cAdding up columns in a data set is a handy data science tool. For most programming languages, this would mean writing the code to load and add up each column. In R, this task is reduced to a single line of code and a single function.
Doing colsums in R involves using the colsums function, which has the form of colSums(dataset) and returns the sum of the columns in the data set. It also has several optional parameters one of which is the logical parameter of na.rm that tells the function whether to remove N/A values or not.
> x = matrix(rep(1:8),6,4) > > x [,1] [,2] [,3] [,4] [1,] 1 7 5 3 [2,] 2 8 6 4 [3,] 3 1 7 5 [4,] 4 2 8 6 [5,] 5 3 1 7 [6,] 6 4 2 8 > > colSums(x)  21 25 29 33
Here is an example of the use of the colsums function. If you add up column 1, you will get 21 just as you get from the colsums function. Along with it, you get the sums of the other three columns. As you can see colsums in r returns the sums of all the columns and not just a selected one.
There are numerous applications resuming up a column in a data set. This is commonly done in spreadsheets and other formats. In data science, it can be used to gather the totals from a list of values. Below, we have an example based on arrest rates in each state of the United States.
> head(USArrests) Murder Assault UrbanPop Rape Alabama 13.2 236 58 21.2 Alaska 10.0 263 48 44.5 Arizona 8.1 294 80 31.0 Arkansas 8.8 190 50 19.5 California 9.0 276 91 40.6 Colorado 7.9 204 78 38.7 > > colSums(USArrests) Murder Assault UrbanPop Rape 389.4 8538.0 3277.0 1061.6
Here, we have the first six rows of this data set show me the first six states in alphabetical order. The data set contains arrest rates for murder, assault, urban populations, and rape. After the colSums function is applied, we have a total in all four categories for all 50 states combined.
There are a couple of potential errors you can throw with this function. For example, the R colsums() function isn’t very tolerant of missing or non-numeric data. You can easily generate lovely errors such as…
error in colsums(x, na.rm = true) : ‘x’ must be numeric
Should this lovely fail-whale appear, the cause is simple enough. Check the data you’ve fed into your process. Something in there isn’t numeric and the colsums function throws a little tantrum to communicate that you. My best suggestion is to filter the missing or incorrect data point from your data and proceed from there.
You may also get:
error in colsums: ‘x’ must be an array of at least two dimensions
Which occurs when you feed a vector (single dimensional series of values) into a function which expects to look at an array.
Related Functions & Broader Usage
There are several functions designed to help you calculate the total and average value of columns and rows in R. In addition to rowmeans in r, this family of functions includes colmeans, rowsum, and colsum. Here’s some specifics on where you use them…
- Colmeans – calculate mean of multiple columns in r .
- Colsums – how do i sum each column in r…
- Rowsums – sum specific rows in r
These functions are extremely useful when you’re doing advanced matrix manipulation or implementing a statistical function in R. These form the building blocks of many basic statistical operations and linear algebra procedures. This is why you sometimes see an error message from this cluster of functions show up as part of a higher level package.
In the event you need them, there are also functions for RowMedians (solves for the median of a row in R) and RowSD (solves for the standard deviation of a row in R). Given the existence of the above, be sure to do a quick search of the various R packages if you need anything more exotic – since it most likely exists…
If you are looking to solve for rowmeans or rowsums by group, check out the aggregate function (one of the items we addressed in our article about descriptive statistics).