Fixing R Error Message: train’ and ‘class’ have different lengths

While error messages are a nuisance, you are going to get them from time to time. Some don’t make any sense at first, but they make sense once you understand what is going on. The “‘train’ and ‘class’ have different lengths” error message is an example of this. At first glance, you have no clue as to what is going on, but once you understand it, it makes perfect sense.

The circumstances of this error.

This error occurs when using the knn() function from the “class” Library. It is intended to determine the nearest neighbor classification for a dataset based on another dataset that you use to train the function.

> library(class)
> df1 = read.table(text = ‘
+ A B C D
+ 1 3 20 81 21
+ 2 4 34 13 22
+ 3 5 46 18 23
+ 4 6 42 16 24
+ 5 7 65 26 25′, header=TRUE)
> df2 = read.table(text = ‘
+ A B C D
+ 1 3 20 81 21
+ 2 4 34 13 22
+ 3 5 46 18 23
+ 4 6 42 16 24
+ 5 7 65 26 25′, header=TRUE)
> knn(df1, df2, df1, k = 3)
Error in knn(df1, df2, df1, k = 3) :
‘train’ and ‘class’ have different lengths

It usually occurs as a result of entering the wrong type of data structure into the classification argument of the knn() function. In this case, it is looking for a vector, but it is receiving a data frame. Specifically, it is looking for seven lines, and you are providing twenty-eight lines for it. Using the terms from the error message itself “train” has a length of seven and “class” has a length of twenty-eight. Based on this information, this error message makes perfect sense.

How to fix this error.

Fixing this error is quite simple. You just need to make sure that the classifications list is a vector with the same length as a column in your training data frame. In a simple case, this can be done manually, but more complicated situations require a more dynamic approach.

> library(class)
>
> df1 = read.table(text = ‘
+ A B C D
+ 1 3 20 81 21
+ 2 4 34 13 22
+ 3 5 46 18 23
+ 4 6 42 16 24
+ 5 7 65 26 25′, header=TRUE)
>
> df2 = read.table(text = ‘
+ A B C D
+ 1 3 20 81 21
+ 2 4 34 13 22
+ 3 5 46 18 23
+ 4 6 42 16 24
+ 5 7 65 26 25′, header=TRUE)
> v1 = df1[,1]
> v1
[1] 3 4 5 6 7
> knn(df1, df2, v1, k = 3)
[1] 6 4 4 4 6
Levels: 3 4 5 6 7

A more dynamic method is to put a column of the test data frame into a vector (v1 = df1 [, 1]) that is to define the functions’ classifications. This method will always ensure consistent lengths to prevent this error.