How To Generate A Confusion Matrix in R

Normally in programming, you do not want confusion but a confusion matrix in r is an exception. It is a handy method for determining the specificity of a model you are testing. Whether you are testing a classification model, regression model, even a multinomial logistic regression confusion matrices provide an objective way of testing your model.

Description of the Function (ConfusionMatrix in R)

The confusionMatrix() function has the form of confusionMatrix(data, reference), where “data” is the predicted value and “reference” is the data source that the predicted value is being compared to. The function returns the confusion matrix and statistics, the overall statistics, and the statistics by class for the comparison. The confusion matrix provides the true positive rate, as well as the false-positive rate for the model. For a classification model the confusionmatrix() function provide the classification accuracy of the classifier. This function provides a way to objectively measure the model performance of any mathematical model.

Key Features of the Function (ConfusionMatrix in R)

The main feature of the confusionMatrix() function is its ability to provide an objective determination of model performance. Though this function is intended to work on vectors, it can be used to compare columns of a data frame as long as they are factored first and have the same type of data. The data that shows up in the factors of both the prediction and target variable needs to be in each vector. For such a simple function to use, it provides a lot of information.

Examples of the ConfusionMatrix function in action

Before trying the examples run this little bit of code to install the necessary packages for the confusionMatrix() function to work.

install.packages(‘caret’)
library(caret)

Here we have three examples of the use of this function.

> x = c(0,1,2,0,1,2,0,1,2)
> r = factor(x)
> d = factor(x)
> t = confusionMatrix(d, r)
> t

This first example produces the following confusion matrix.

–0 1 2
0 3 0 0
1 0 3 0
2 0 0 3

In this confusion matrix, the prediction is the left-hand column and the reference is the top row. You will note that where each of the numbers is the same it has a “3” because in this case the prediction and the reference are identical.

> x = c(0,1,2,0,2,1,0,1,2)
> y = c(0,1,2,0,1,2,0,1,2)
> r = factor(y)
> d = factor(x)
> t = confusionMatrix(data=d, r)
> t

This second example produces the following confusion matrix.

–0 1 2
0 3 0 0
1 0 2 1
2 0 1 2

In this confusion matrix, the prediction is the left-hand column and the reference is the top row. In this case, you have two that are swapped causing two false positives.

> a = 0:2
> b = c(2,1,0)
> x = c(a,b,a)
> y = c(a,b,b)
> r = factor(y)
> d = factor(x)
> t = confusionMatrix(data=d, r)
> t

This third example produces the following confusion matrix

–0 1 2
0 2 0 1
1 0 3 0
2 1 0 2

In this confusion matrix, the prediction is the left-hand column and the reference is the top row. In this example, we also have to two that are swapped but in a different manner. It also produces two false positives but in different positions.

The confusionMatrix() function is a simple but powerful tool for your r programming tool belt. It will allow you to compare models to actual data in an objectively numerical way, making it easier to find the correct model.