In data science finding the mean value of a column in a data set can be helpful. Finding the mean value of a column usually requires loading in a data set, finding the values in a given column and adding them together. R reduces this to a single step once you have the data set.
The colMeans Function
Obtaining colMeans in R uses the colMeans function which has the format of colMeans(dataset), and it returns the mean value of the columns in that data set. The function has several optional parameters that can be added. One of these optional parameters is the logical perimeter na.rm, which determines if the function skips N/A values.
> x = matrix(rep(1:8),6,4) > > x [,1] [,2] [,3] [,4] [1,] 1 7 5 3 [2,] 2 8 6 4 [3,] 3 1 7 5 [4,] 4 2 8 6 [5,] 5 3 1 7 [6,] 6 4 2 8 > # example - colmeans in R > colMeans(x)  3.500000 4.166667 4.833333 5.50000
In this example, there are six rows and four columns with numbers one through eight. It is arranged so that the columns differ, resulting in different mean values. After the colMeans function is applied to the data set, the result is the mean values for every column, is acquired in one easy step.
Finding colMeans in R has many applications, including obtaining the mean values of the items in the same category. If you have a data set with several categories each making a column with values for several items going down the columns the colMeans function will get you the mean values of the items for each category.
> head(USArrests) Murder Assault UrbanPop Rape Alabama 13.2 236 58 21.2 Alaska 10.0 263 48 44.5 Arizona 8.1 294 80 31.0 Arkansas 8.8 190 50 19.5 California 9.0 276 91 40.6 Colorado 7.9 204 78 38.7 > # example - colmeans in R > colMeans(USArrests) Murder Assault UrbanPop Rape 7.788 170.760 65.540 21.232
Running this data set of United States arrests through the colMeans function results in the mean values among the states for murders, assaults, urban populations, and rapes. These results provide a good comparison of how each state handles crime.
Obtaining the mean value of columns of data has many uses. The colMeans function is a tool that makes it possible to quickly get that data. It is a useful tool in R’s arsenal for statistical analysis.
There are a couple of potential errors you can throw with this function. For example, the R colmeans() function isn’t very tolerant of missing or non-numeric data. You can easily generate lovely errors such as…
error in colmeans(x, na.rm = true) : ‘x’ must be numeric
Should this lovely fail-whale appear, the cause is simple enough. Check the data you’ve fed into your process. Something in there isn’t numeric and the colmeans function throws a little tantrum to communicate that you. My best suggestion is to filter the missing or incorrect data point from your data and proceed from there.
You may also get:
error in colmeans: ‘x’ must be an array of at least two dimensions
Which occurs when you feed a vector (single dimensional series of values) into a function which expects to look at an array.
Related Functions & Broader Usage
There are several functions designed to help you calculate the total and average value of columns and rows in R. In addition to rowmeans in r, this family of functions includes colmeans, rowsum, and colsum. Here’s some specifics on where you use them…
- Colmeans – calculate mean of multiple columns in r .
- Colsums – how do i sum each column in r…
- Rowsums – sum specific rows in r
These functions are extremely useful when you’re doing advanced matrix manipulation or implementing a statistical function in R. These form the building blocks of many basic statistical operations and linear algebra procedures. This is why you sometimes see an error message from this cluster of functions show up as part of a higher level package.
In the event you need them, there are also functions for RowMedians (solves for the median of a row in R) and RowSD (solves for the standard deviation of a row in R). Given the existence of the above, be sure to do a quick search of the various R packages if you need anything more exotic – since it most likely exists…
If you are looking to solve for rowmeans or rowsums by group, check out the aggregate function (one of the items we addressed in our article about descriptive statistics).