How To Calculate Standard Deviation in R

The standard deviation of a sample is one of the most commonly cited descriptive statistics, explaining the degree of spread around a sample’s central tendency (the mean or median). It is commonly included in a table of summary statistics as part of exploratory analysis. If you are doing an R programming project that requires this statistic, you can easily generate it using the sd () function in Base R. This function is robust enough to be used to calculate the standard deviation of an array in R, the standard deviation of a vector in R, and the standard deviation of a data frame variable in R.

How to Find Standard Deviation in R

You can calculate standard deviation in R using the sd() function. This standard deviation function is a part of standard R, and needs no extra packages to be calculated. By default, this will generate the sample standard deviation, so be sure to make the appropriate adjustment (multiply by sqrt((n-1)/n)) if you are going to use it to generate the population standard deviation.

# set up standard deviation in R example
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32)

# standard deviation R function
# sample standard deviation in r
> sd(test)
[1] 5.501377

As you can see, calculating standard deviation in R is as simple as that- the basic R function computes the standard deviation for you easily. Need to get the standard deviation for an entire data set? Use the sapply () function to map it across the relevant items. For this example, we’re going to use the ChickWeight dataset in Base R. This will help us calculate the standard deviation of columns in R.

# standard deviation in R - dataset example
# using head to show the first handful of records
> head(ChickWeight)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1

# standard deviation in R - using sapply to map across columns
# how to calculate standard deviation in r data frame
> sapply(ChickWeight[,1:4], sd)
   weight      Time     Chick      Diet 
71.071960  6.758400 13.996847  1.162678 

Learning how to calculate standard deviation in r is quite simple, but an invaluable skill for any programmer. These techniques can be used to calculate sample standard deviation in r, standard deviation of rows in r, and much more. None of the columns need to be removed before computation proceeds, as each column’s standard deviation is calculated.

Interpreting Results

A low standard deviation relative to the mean value of a sample means the observations are tightly clustered. Larger values indicates that many observation(s) lie distant from the sample mean. This metric has many practical applications in statistics, ranging from measuring the risk of an error in hypothesis testing to identifying the confidence interval of a forecast or pricing the risk of an event in finance or insurance. Many data science and statistical learning algorithms incorporate some form of the standard deviation for automated screening & analysis. This measure also plays a key role in analyzing the results of a linear regression procedure. While the metric is broadly applicable, there is an underlying assumption the data values were generated by a random variable from the normal distribution if you intend to use the statistic for risk estimation or quantitative analysis.

As noted above, the sd() function uses the standard deviation formula for sample variance. If you are going to calculate the population standard deviation parameter, you will need to make the appropriate adjustment. The metric is sensitive to sample size, which has implications if you are watching the results of a repeated sampling process.

Need to work with standard error? We’ve got you covered here….

Related Materials

Scroll to top