Percentages and percentiles are similar in many ways, and sometimes the terms are used interchangeably, but they are different things. A percentage represents a fraction while a percentile represents the fraction of the data points of a data set below a certain point. Both a percentage and percentile provide useful information about a data set but they are not the same.
A percentile within a data set is the value within the data set that has a certain percentage of the data points below it. To demonstrate how the process works, I will demonstrate by finding the 12th 37th 62nd 87th percentiles.
5 10 12 15 20 24 27 30 35
Here is our example already in numerical order, there are nine values in this data set. To find the percentile we take the percentage of number of values in the data set, count up that number of values and then go to the next value up. That value is our percentile.
- 12% of 9 = 1.08 – percentile = 10
- 37% of 9 = 3.33 – percentile = 15
- 62% of 9 = 5.58 – percentile = 24
- 87% of 9 = 7.83 – percentile = 30
What these results show is that 12% of the values or less than 10, 37% or less than 15, 62% or less than 24, and 87% or less than 30. This process naturally works better with larger data sets. This is in part because you need to get a hundred data points before you have a complete percentile set.
The three quantiles of a data set are the numbers whose percentiles are the quarter marks of the data set. Specifically, they are the values in the data set that are at 25%, 50%, and 75%. These calculations are the same as the percentile calculations above.
- 25% of 9 = 2.25 – quantile1 = 12
- 50% of 9 = 4.50 – quantile2 = 20
- 75% of 9 = 6.75 – quantile3 = 27
This clearly connects percentile and quantiles calculations showing how closely the concepts are related. This is why R uses the same function for both.
How to find percentiles in R
So how to find percentiles in R? You find a percentile in R by using the quantiles function. It produces the percentage with the value that is the percentile.
# how to find percentiles in R - quantile in r > x = c(5,10,12,15,20,24,27,30,35) > quantile(x) 0% 25% 50% 75% 100% 5 12 20 27 35
This is the default version of this function, and it produces the 0% percentile, 25% percentile, 50% percentile, 75% percentile, and 100% percentile.
# how to find percentiles in r - quantile in r > x = c(5,10,12,15,20,24,27,30,35) > quantile(x, probs = c(0.125,0.375,0.625,0.875)) 12.5% 37.5% 62.5% 87.5% 10 15 24 30
Here, we have the inclusion of the probs (probability) option which allows you to set other percentages.
There are many applications to finding a percentile in R. Here’s a good example of a long data set consisting of 7,980 data points.
# how to find percentiles in R using treering data > quantile(treering) 0% 25% 50% 75% 100% 0.000 0.837 1.034 1.197 1.908
Here, we have the quantiles and the minimum and maximum values. One thing it reveals about these tree rings is that they tend to be concentrated in the middle. The IQR is 0.36 when the range is 1.908 meaning that the IQR makes up only about 19% of the range of the data set.
Finding the numbers that represent a given percentage in a data set can tell you much about it. It can tell you how concentrated and skewed the values are. It is an example of R as a tool in data science.