How to Calculate Mean in R
Since this is a standard study, there is a base R function which you can use to calculate the mean in R. It is, quite appropriately, titled “mean”. Here is an example of how to use the mean function in r.
# find mean in r - example
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32)
> mean(test)
[1] 35.41667
Mean in R – Remove Na Rows
A common annoyance is missing values in your data. Fortunately, this is fairly easy to address with the na.rm option, as shown below.
# find mean in r - example
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32, NA,NA)
# mean in R - calculation fails due to missing values
> find mean(test)
[1] NA
# find mean in R - success with na.rm=True option
> mean(test, na.rm=TRUE)
[1] 35.41667
Calculate Mean in R – Summary Statistics
You can also obtain the mean of a sample as part of the “summary” descriptive statistics function. This is a handy tool for checking out a new set of information.
# find mean in R
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32, NA,NA)
> summary(test)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
24.00 32.00 34.00 35.42 39.50 43.00 2
Need to run mean for multiple columns? You can either run summary (see below) or use sapply to map it across the columns of the dataframe. Both approaches are demonstrated below.
# find mean in R - multiple columns
> head(ChickWeight)
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
# find mean in R - calculated using summary
> summary(ChickWeight)
weight Time Chick Diet
Min. : 35.0 Min. : 0.00 13 : 12 1:220
1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120
Median :103.0 Median :10.00 20 : 12 3:120
Mean :121.8 Mean :10.72 10 : 12 4:118
3rd Qu.:163.8 3rd Qu.:16.00 17 : 12
Max. :373.0 Max. :21.00 19 : 12
(Other):506
# find mean in R - calculated using sapply; limit to first two columns
> sapply(ChickWeight[,1:2], mean)
weight Time
121.81834 10.71799
As the second example indicates, you can use a one liner to calculate the mean of two columns in r. This is useful for quick comparisons and data validation.
Calculate Mean in R By Group
Another common use case is calculating the mean for a subset of data within a data frame. For this example, we’re going to use the mtcars built in data-set. Can we see a difference in the mean fuel efficiency (miles per gallon) between cars with a different number of cylinders?
# introduce data - calculate mean in r by group
> data("mtcars")
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
To derive our answer, we’re going to use the aggregate function and apply R’s mean function to each of the subgroups. Results are displayed below.
# calculate mean in r by group / aggregate function solution
> aggregate(mtcars$mpg, by=list(mtcars$cyl), FUN=mean)
Group.1 x
1 4 26.66364
2 6 19.74286
3 8 15.10000
As the example indicates, the number of cylinders a car has apparently has a notable effect on its fuel efficiency. And R’s aggregate function gives us yet another one liner we can use to quickly inspect data.
Related Materials