Obtaining the average across columns in r is a common approach to processing information. It is the same as averaging along a data frame row. This is often necessary because of the relationships the can occur between the data within a row. When such relationships exist finding that column average can be an important part of understanding what is going on.
Description of the function.
The rowMeans() function has the format of rowMeans(x) with the optional argument of na.rm. The na.rm argument is defined as either true or false and it tells the function whether or not to exclude NA values. The rowMeans()average function is used to calculate the mean value across multiple columns. This makes it particularly useful for dealing with data frames. It is not the average value function for dealing with individual columns that would be the colmean() average value function. Getting the average of values in the same row but different columns is a tool for understanding the relationship if one exists.
Explanation of the function.
The rowMeans()average function finds the average numeric vector of a dataframe or other multi-column data set, like an array or a matrix. It provides a descriptive statistic for the rows of the data set. It works by taking a sum of the items in the row and dividing it by the total number of individual columns in the dataframe, array, or matrix. This is generally known as the average and this is a very important statistical value when dealing with any numeric vector data type. Real-world data manipulation seldom falls as neatly as we would like it and what theory predicts. This is because a real-world dataframe often has other factors besides a particular model that affect actual measurements- multiple variables, missing values or column names, null values, a blank cell here and there, etc.
Examples of the function in action.
There are several ways that this can be illustrated. The first example shows a straightforward use of this data manipulation function with existing columns and the second example shows how the new variable and new column information can be added to the data frame.
# average across columns in r
> x = data.frame(a= c(8, 6, 2, NA, 9),
+ b = c(9, 3, 8, 3, 2),
+ c = c(6, 7, 5, 8, 9),
+ d=c(8, 5, 7, 9, 6))
> x
a b c d
1 8 9 6 8
2 6 3 7 5
3 2 8 5 7
4 NA 3 8 9
5 9 2 9 6
> rowMeans(x, na.rm=TRUE)
[1] 7.750000 5.250000 5.500000 6.666667 6.500000
Here we have a straight out illustration of this function in action we get the averages with the one NA value removed.
# how to average across columns in r
> x = data.frame(a= c(8, 6, 2, NA, 9),
+ b = c(9, 3, 8, 3, 2),
+ c = c(6, 7, 5, 8, 9),
+ d=c(8, 5, 7, 9, 6))
> x$mean = rowMeans(x, na.rm=TRUE)
> x
a b c d mean
1 8 9 6 8 7.750000
2 6 3 7 5 5.250000
3 2 8 5 7 5.500000
4 NA 3 8 9 6.666667
5 9 2 9 6 6.500000
Here is how to add the averages to the data frame in a separate column. This puts both sets of data together in a convenient format.
Application of this function.
This is applicable in any situation with a data frame or another multi-column data set. It produces meaningful results when the content in each row is related to each other. In such cases taking an average provides useful information about that relationship.
Taking any average variable data type of non null values in existing columns is an important part of data science. It is helpful in demonstrating the relationship between individual pieces of information that otherwise is not evident.
Looking for more great R programming content? Check out the rest of our site, and these other great articles: