Calculate Skewness in R

Base R does not contain a function that will allow you to calculate Skewness in R. We will need to use the package “moments” to get the required function.

Skewness is a commonly used measure of the symmetry of a statistical distribution. A negative skewness indicates that the distribution is left skewed and the mean of the data (average) is less than the median value (the 50th percentile, ranking items by value). A positive skewness would indicate the reverse; that a distribution is right skewed. A right skewed distribution would be biased towards higher values, such that the mean of the distribution will exceed the median of the distribution.

Right Skewed distributions are fairly common in the social sciences and often indicate the presence of a handful of exceptionally high outliers. For example, look at the data distribution of income and wealth in many societies. There is usually a handful of high observations which raise the average above the median value. This would be a positive skew, with the data distribution of the numeric vector in your data frame or dataset leans towards the right, or values in the upper quantile.

A normal distribution does not have a positive skew or negative skew, but rather the probability distribution is a symmetrical bell curve. A symmetric distribution that passes the normality test shows that the sample is not skewed in either direction, and the dependent variable follows all of the measures of central tendency that a standard normal distribution would. A perfectly symmetrical distribution with no skew is uncommon, as it is near impossible to have no negative or positive skewness whatsoever, but with a large enough sample size even a little bit of skewed data will look like a symmetrical bell curve on the whole.

If the skewness value or sample skewness of your data frame or data set is negative, you have a left skewed distribution. In descriptive statistics, a negative skewness means you have too much of your data in the lower values, and something with your dependent variable makes the skewness value negative because there is a correlation towards the lower values of the distribution.

# calculate skewness in r

> skewness(test)
[1] -0.296553

Related Materials