Guide to Using Qt in R - The Inverse Cumulative Density Function of Student's T Distribution

Modern computing, and programming languages, have been a tremendous boon to anyone who works with large amounts of data. The days of tediously entering statistics into a calculator is a thing of the past. And today we can import data into a system while making it instantly available to various computational solutions.

But not all programming languages relate to data in the same way. And R in particular is especially useful when dealing with statistics. In fact, it has several built-in methods to deal with common challenges. For example, imagine that you needed to calculate the value of the inverse cumulative density function of a Student t distribution. The R programming language has a whole set of functions dedicated to working with the Student t distribution. And you’ll soon see just how easy they make distribution-related calculations.

Starting Out With the Basics

We can begin by defining all of the elements that are in play. First, the Student t-distribution is a specific probability distribution that can estimate population parameters that either have a small or unknown variance. The calculation helps people determine population parameters thanks to its heavier independent observations within a sample.

Note that t-distributions are defined by degrees of freedom. And degrees of freedom are themselves analogous to the number of independent observations within a sample set or data frame. This t-distribution is itself incredibly useful as a metric within statistical tests as it can help show differences between two means.

R’s Relationship to T-Distribution

We’ve previously noted that R has a number of built-in functions related to t-distributions. The main functions are dt, qt, pt, and rt. Each of these functions provides a useful twist on t-distribution calculations. Likewise, they all share a relatively similar syntax. The dt function provides the most basic example. It lets us calculate density’s probability by passing a variable and degrees of freedom. This can be written shorthand as dt(v,df). So if we had a data set with 20 degrees of freedom and the value of Student’s t-distribution was 2 we could call the dt function like so.

ourCalculation <- dt(2,20)
print(ourCalculation)

Note just how easy it is to perform the calculation with dt. This methodology remains constant through most of the related t-distribution functions. There are some minor quirks dependent on extended functionality. For example, rt calculates vectors of random variables which follow t-distriubitons. And calling that function also requires you to supply an argument that consists of vector lengths. But aside from those minor quirks, you can generally assume that t-distribution calculations are relatively consistent in R. With all of that in mind we’re now ready to move onto the use of qt in R to determine the inverse cumulative density of Student’s t-distribution. While this proposition might seem difficult, it’s just as easy as our earlier example with dt.

Moving Into a Full Implementation Using Qt

The qt function provides an example of something whose ease of use belies its underlying complexity. And this is really the true charm of any data-focused language. It gives us a way to essentially write things out in shorthand while still having access to the underlying structure. So, without further ado, take a look at how we’d go about solving this problem in R if our probability value is 0.85 with 6 degrees of freedom.

ourStat <- qt(0.85, 6)
print(ourStat)

As in the previous example with dt, we just need to supply qt with two arguments. This is our probability value of 0.85 and our 6 degrees of freedom. We now have the inverse cumulative density value for our probability of 0.85 with 6 degrees of freedom. However, we can also build on this concept to determine the quantile of the t-distribution for our probability and t-distribution. Take a look at the following code to see how we can expand our results while still keeping the code relatively concise.

We can begin by removing the hardcoding of our variables, like so.

ourProbability <- 0.85
ourDegreeOfFreedom <- 6
ourStat <- qt(ourProbability, ourDegreeOfFreedom)
print(ourStat)

If you run this example you’ll see that our return value within ourProbability remains consistent with the previous code. Now, let’s see how we can build even more on top of this foundation.

ourProbability <- 0.85
ourDegreeOfFreedom <- 6
ourStat <- qt(ourProbability, ourDegreeOfFreedom)
ourStat2 <- qnorm(ourProbability, lower.tail = FALSE)

print(ourStat)
print(ourStat2)

We begin by assigning our values to ourProbability and ourDegreeOfFreedom. And we once again use 0.85 and 6. We then pass those values to qt and assign the result to ourStat. However, we’re switching things up a bit in the next line. We also pass ourProbability to a function called qnorm. The qnorm function calculates an inverse cumulative density of the distribution’s set based on ourProbability. Specifically, because we’re using lower.tail as FALSE the function will calculate based on the upper tail of the distribution. By default the flag is set to TRUE, which makes the function look at the lower tail.

It’s important to note that the functionality runs somewhat counter to what we might assume at first glance. Setting lower to FALSE is essentially saying upper is TRUE. The arguments are simply set for the most common usage scenario so we end up defaulting to the probability sitting in the lower tail of the distribution. So with the flag in place, we get an inverse cumulative density for the upper end of the distribution. And, finally, we finish things off by printing out the results from both the qt and qnorm functions.

Keep in mind that we went through the process in a somewhat overly verbose manner with explicit declarations and calls to print the results to screen. The actual calculations themselves only took one line for each approach to the data manipulation. That’s an impressive amount of power considering this is all default functionality that doesn’t need any additional libraries.

The main takeaway from these examples should be just how fluid numbers are in R. It’s extremely easy to run multiple complex calculations on a single defined variable. This also fits into the ease with which R allows us to perform loops without needing an explicit declaration. We could easily expand these examples into the apply function to chart out a progression of the value over multiple points in time. This could then be easily visualized. For example, we could plot the values as a scatterplot with connections highlighting their progression over time.