How to Find the Interquartile Range in R

In data science find the spread of a data set can often lead to clues about the statistical relationship between the data points. The interquartile range is a useful type of spread since it is not affected much by outlying extremes. The interquartile range of a data set is the difference between the values that fall at the 25% and 75% points when the data points are placed in numerical order.

Interquartile range

Before finding the interquartile range in R let us look at a simple way to find the interquartile range manually so that you can better understand what the function is doing. To do this, we start by finding the median value.

5 10 12 15 (20) 25 27 30 35

In this example, we have a data set with an odd number of data points. Start with a data set and then arrange the values in numerical order. Next, we want to find the median which is the number right in the middle of the sequence. Here, the median is 20. If you have an even number of data points, simply find the middle and take the average of the two values on either side the middle.

5 10 [12] 15 (20) 25 [27] 30 35

Next, you find the middle of each half on both sides of the median. In this case, you have 12 in the middle of the low-end (first quartile – Q1) and 27 in the middle of the high-end. (third quartile – Q3) from here it is just a matter of subtracting the first quartile from the third quartile to get the interquartile range.

IQR = Q3-Q1 = 27-12 = 15

Finding the IQR in R is a simple matter of using the IQR function to do all this work for you. You can also get the median and the first and second quartiles with the summary() function.

Iqr function

Finding the interquartile range in R is a simple matter of applying the IQR function to the data set, you are using. It has the format of IQR(data set) and returns the interquartile range for that data set. Its companion summary function has the format of summary(data set) and returns the minimum value, maximum value, median, mean, the first quartile and the third quartile.

# how to find interquartile range in R 
> x =c(5, 10,12,15,20,25,27,30, 35)
 > summary(x)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    5.00   12.00   20.00   19.89   27.00   35.00 
 > IQR(x)
 [1] 15

Here is the data set of our earlier example having been put through both the summary and IQR functions. It shows the same median, quartiles and interquartile range as we manually calculated.

# interquartile range in R; summary() procedure
> x =c(5, 10,12,15,18,22,25,27,30,35)
 > summary(x)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    5.00   12.75   20.00   19.90   26.50   35.00 
 > IQR(x)
 [1] 13.75

Here is an example of a data set with an even number of data points. See how the median of 20 is the average of 18 and 22. Also, notice how the quartiles have shifted due to this change.

Finding the interquartile range in R is helpful for knowing the spread of a data set. Using IQR in R and the summary() function reduces what would otherwise take over a dozen lines of code down to just two. Yet, another reason R is such an excellent data science tool.