Beating Average – How to Find The Mean in R

How to Calculate Mean in R

Since this is a standard study, there is a base R function which you can use to calculate the mean in R. It is, quite appropriately, titled “mean”. Here is an example of how to use the mean function in r.

# find mean in r - example
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32)
> mean(test)
[1] 35.41667

Mean in R – Remove Na Rows

A common annoyance is missing values in your data. Fortunately, this is fairly easy to address with the na.rm option, as shown below.

# find mean in r - example
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32, NA,NA)

# mean in R - calculation fails due to missing values
> find mean(test)
[1] NA

# find mean in R - success with na.rm=True option
> mean(test, na.rm=TRUE)
[1] 35.41667

Calculate Mean in R – Summary Statistics

You can also obtain the mean of a sample as part of the “summary” descriptive statistics function. This is a handy tool for checking out a new set of information.

# find mean in R 
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32, NA,NA)
> summary(test)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  24.00   32.00   34.00   35.42   39.50   43.00       2 

Need to run mean for multiple columns? You can either run summary (see below) or use sapply to map it across the columns of the dataframe. Both approaches are demonstrated below.

# find mean in R - multiple columns
> head(ChickWeight)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1

# find mean in R - calculated using summary
> summary(ChickWeight)
     weight           Time           Chick     Diet   
 Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
 1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
 Median :103.0   Median :10.00   20     : 12   3:120  
 Mean   :121.8   Mean   :10.72   10     : 12   4:118  
 3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
 Max.   :373.0   Max.   :21.00   19     : 12          
                                 (Other):506  

# find mean in R - calculated using sapply; limit to first two columns
> sapply(ChickWeight[,1:2], mean)
   weight      Time 
121.81834  10.71799 

As the second example indicates, you can use a one liner to calculate the mean of two columns in r. This is useful for quick comparisons and data validation.

Calculate Mean in R By Group

Another common use case is calculating the mean for a subset of data within a data frame. For this example, we’re going to use the mtcars built in data-set. Can we see a difference in the mean fuel efficiency (miles per gallon) between cars with a different number of cylinders?

# introduce data - calculate mean in r by group
> data("mtcars")
> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

To derive our answer, we’re going to use the aggregate function and apply R’s mean function to each of the subgroups. Results are displayed below.

# calculate mean in r by group / aggregate function solution
> aggregate(mtcars$mpg, by=list(mtcars$cyl), FUN=mean)
  Group.1        x
1       4 26.66364
2       6 19.74286
3       8 15.10000

As the example indicates, the number of cylinders a car has apparently has a notable effect on its fuel efficiency. And R’s aggregate function gives us yet another one liner we can use to quickly inspect data.

Related Materials