Sometimes when doing a test statistic in data science you are going to need to use a paired t test in r. It is one of several types of statistical tests and it has similarities to the Wilcoxon test and the students t test. Each type of test provides different types of information about the data.

### What is a Paired t test?

A paired t-test is a statistical test that is actually three tests in one, because in R the function checks for the null hypothesis against one of three alternative hypotheses. When doing this form of testing you have a base R function which has the format of t.test(x, y, paired, alternative). Where “x” and “Y” are the vectors being evaluated. “Paired” is a true or false option that tells the function whether or not to pair the individual values. “Alternative” defines the alternative hypothesis to be evaluated and it can have a value of “two.sided”, “greater” or “less”. The “alternative” argument is optional but it has a default value of “two.sided”.

When the t-testing function is used it provides a variety of additional information other than that of t-testing itself. The other output values include the degrees of freedom, the p-value, and the percent confidence interval and the mean of the differences. This simple little function supplies more information than the basic testing is designed to do. However, they are incredibly useful values that you would most likely be interested in calculating when doing any statistical testing. It is a useful part of statistical testing. The main goal of t-testing itself is to compare two sets of observations to find out if the mean difference between them is equal to zero.

### How does a Paired t-test work?

The paired t testing uses the following formula: t=m/(s/√<SPAN STYLE=”text-decoration:overline”>n</SPAN>)

- m = The mean difference.
- s = Standard deviation of the differences.
- n = The sample size which is equivalent to the length of the vectors being evaluated.

Effectively the tested value is the mean differences divided by the standard error. Because the standard error is s/√<SPAN STYLE=”text-decoration:overline”>n</SPAN>. Mean differences are similar to but not identical to a sample mean. The sample mean is the average of all the samples in a given set of data. However, mean differences are the mean of the differences between paired data.

The null hypothesis is defined as being true if any of the following are truth.

- The mean difference is equal to zero.
- The mean difference is greater than zero.
- The mean difference is less than to zero.

The definition of each alternative hypothesis is defined by the following cases.

- The mean difference is not equal to zero.(two.sided)
- The mean difference is greater than zero.(greater)
- The mean difference is less than to zero.(less)

With this information, you will be able to understand the results that you get when using the t-testing function.

### Examples of paired t-tests

Here we have five paired test examples using a variety of paired samples and all three alternative hypotheses.

> x = c(1, 2, 3)

> y = c(4, 6, 8)

> t.test(x, y, paired = TRUE)

### Paired t-test

data: x and y

t = -6.9282, df = 2, p-value = 0.0202

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-6.484138 -1.515862

sample estimates:

mean of the differences

-4

This example examines the case where the second vector has higher values than the first, along with the default alternative.

> x = c(4, 6, 8)

> y = c(1, 2, 3)

> t.test(x, y, paired = TRUE)

Paired t-test

data: x and y

t = 6.9282, df = 2, p-value = 0.0202

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

1.515862 6.484138

sample estimates:

mean of the differences

4

This example examines the case where the first vector has higher values than the second, along with the default alternative.

> x = c(1, 2, 3)

> y = c(1, 2, 3)

> t.test(x, y, paired = TRUE, alternative = “two.sided”)

Paired t-test

data: x and y

t = NaN, df = 2, p-value = NA

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

NaN NaN

sample estimates:

mean of the differences

0

This example examines the case where both vectors are the same with the “two.sided” as the alternative.

>

> x = c(1, 2, 3)

> y = c(4, 6, 8)

> t.test(x, y, paired = TRUE, alternative = “greater”)

Paired t-test

data: x and y

t = -6.9282, df = 2, p-value = 0.9899

alternative hypothesis: true difference in means is greater than 0

95 percent confidence interval:

-5.685854 Inf

sample estimates:

mean of the differences

-4

This example examines the case where the second vector has higher values than the first, along with ” greater” as the alternative.

> x = c(4, 6, 8)

> y = c(1, 2, 3)

> t.test(x, y, paired = TRUE, alternative = “less”)

Paired t-test

data: x and y

t = 6.9282, df = 2, p-value = 0.9899

alternative hypothesis: true difference in means is less than 0

95 percent confidence interval:

-Inf 5.685854

sample estimates:

mean of the differences

4

This example examines the case where the first vector has higher values than the second, along with “less” as the alternative.

### Applications of paired t-tests

The main application of paired t-testing is comparing paired datasets to see if there is a significant difference. These datasets can be individual vectors or columns in a data frame, but they do have to be pairable to produce meaningful results. One such situation would be comparing student grades from two semesters. Another situation would be comparing the heights of men and women. In fact, comparing almost any numerically defined traits between men and women is an excellent example of a practical application of paired t-testing. Another practical application would be comparing the gas mileage of vehicles between highway and city driving. Comparing the performance of political parties in various races and elections would be another practical application of paired t-testing. An example of a business application would be comparing the sales of two different versions of the same product. Paired t-testing can find an application anytime you have two sets of data that can be paired and needs to be compared for statistical differences.

A paired test is a handy tool for comparing two datasets. Paired t-testing can be used anytime you have pairable datasets. While you can perform such tests on any two datasets, they have to be pairable in some manner for the results to be meaningful. Used properly it will provide you with useful statistical information.