We’re going to show you how to calculate a percentile in R. This is particularly useful for exploratory analysis, especially if the underlying data doesn’t match the normal distribution.
The percentile formula takes your data set and will rank data points relative to each other; the percentile is aligned with a probability based view of the data. You can use the percentile rank formula to compare a given value with the median of the distribution (50th percentile), typical span (25th percentile, 75th percentile) and extremes (95th percentile, 5th percentile).
We’re going to use the r quantile function; this utility is part of base R (so you don’t need to import any libraries) and can be adapted to generate any “rank based” statistic(s) about your sample. When computing percentiles in R, set the percentile as parameter of the quantile function. See the example below (examples of finding percentiles).
To calculate a percentile in R, set the percentile as parameter of the quantile function. See the example below.
# percentile in r example > test = c(9,9,8,9,10,9,3,5,6,8,9,10,11,12,13,11,10) > quantile(test, .27) 27% 8.32
Need to calculate a percentile in R despite missing values in your data? You can use the na.rm option to remove missing values before the calculation. Sample code shown below:
# calculate percentile in R with missing values > othertest = c(9,9,8,9,10,9,3,5,6,8,9,10,11,12,NA,NA,NA) > quantile(othertest,.23) Error in quantile.default(othertest, 0.23) : missing values and NaN's not allowed if 'na.rm' is FALSE > quantile(othertest,.23, na.rm=TRUE) 23% 7.98
Finally, you have the option of generating multiple percentiles using the same function call; explicitly declare the “prob” option and pass the percentiles as a vector rather than using a single percentile value. This is particularly useful if you need to quickly size up a distribution.
# calculate percentile in R - multiple values > test = c(9,9,8,9,10,9,3,5,6,8,9,10,11,12,13,11,10) > quantile(test, prob=c(.1,.25,.5,.75,.9)) 10% 25% 50% 75% 90% 5.6 8.0 9.0 10.0 11.4
If you need a quick way to check a variable, you can also use the summary function. It addresses most of the example above…
# calculate percentile in R - summary function > test = c(9,9,8,9,10,9,3,5,6,8,9,10,11,12,13,11,10) > summary(test) Min. 1st Qu. Median Mean 3rd Qu. Max. 3.000 8.000 9.000 8.941 10.000 13.000