You can use the var function to calculate the sample variance in R. This is part of the base R package, so you don’t need to load additional libraries.
What is Variance?
In descriptive statistics, a population variance or sample variance is the average of the squared distances from the mean of the dependent variable. It is also known as the square of the population or sample standard deviation, as sample standard deviation is the square root of sample variance. Learning to compute variance can help you improve your data analysis and descriptive statistics skills, and perform an important statistical test to measure the significant or random effects of the independent variable on the dependent variable.
Variance describes the average variation from the expected value of the random variable in your data frame, and can help measure the probability that the explanatory variable is in fact a predictor of the linear model shown by the dependent variable. A larger sample size is best when trying to determine probability within a data frame, but calculating variance in an R function is easy, even if you do not know the sample size or the expected value. Simply plug in each value in the numeric vector or dataframe into the variance function, and you are on your way to doing linear regression, and many other types of data analysis.
# calculate variance in R
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32)
> var(test)
[1] 30.26515
Variance Component: Analysis With Missing Values
A common problem with sample data in an R function or dataframe is missing values. As the code below indicates, missing values will cause the calculation to crash. You can use the na.rm option contained within the var function to remove missing values. It will compute variance using the non-missing values.
# calculate variance in R - missing values example
> test <- c(41,34,39,34,34,32,37,32,43,43,24,32, NA,NA)
# calculate variance in R - test fails due to NA values
> var(test)
[1] NA
# calculate variance in R; remove missing values, correct result
> var(test, na.rm=TRUE)
[1] 30.26515
Got other items in that problem set? Check out the standard deviation and standard error pages….
Related Materials