Fixing the R Warning – Invalid Factor Level, NA Generated

When doing data science in the r programming language you will get the “Invalid factor level, NA generated” warning message when you try to add a value to categorical data that is not part of a defined level. The way to fix it is to change the data class to another class and then back to a factor.

Description of the warning: Invalid Factor Level, NA Generated

This warning message occurs when you try to add a value to a factor vector that is not part of its levels argument. This is not a problem with the rbind function, but it occurs when directly adding a unique value to a factor vector. It does not occur with a logical vector or numeric vector but it can occur with an ordered factor. It cannot occur with data frame rows. It can occur with a column if it is a factor. Because the new value is not a level in the factor vector, the system inserts a missing value because it cannot insert the value that was provided to it.


Explanation of the warning: Invalid Factor Level, NA Generated

Here are two examples of code that produce this warning message under different situations.

> x = factor(c(“A”, “B”, “A”, “C”, “B”, “C”))
> x
[1] A B A C B C
Levels: A B C
> b = x
> b[7] = “D”
Warning message:
In `[=.factor`(`*tmp*`, 7, value = “D”) :
invalid factor level, NA generated
> b
[1] A B A C B C NA
Levels: A B C

In this example, we try to insert a unique value into a factor variable that is not a level within that vector. In this case, the length of the vector is being extended but because “D” is not a level in the vector it is replaced with a NA value.

> df = data.frame(A = c(“A”,”B”,”C”,”A”,”B”),
+ B = c(1, 2, 3, 4, 5))
> df$A[5] = “D”
Warning message:
In `[=.factor`(`*tmp*`, 5, value = c(1L, 2L, 3L, 1L, NA)) :
invalid factor level, NA generated
> df
A B
1 A 1
2 B 2
3 C 3
4 A 4
5 NA 5

In this example, we have a data frame with a character vector as one of its columns, this vector is created as a factor variable by default. This does not occur with either numeric values or an integer vector. Here we try to insert a unique value that is not a level within the character vector. In this case, the “D” is trying to replace the last “B” but because “D” is not a level in the vector it is replaced with a NA value.

How to fix the Warning: Invalid Factor Level, NA Generated

Here we have two examples of how to fix this warning message.

> x = factor(c(“A”, “B”, “A”, “C”, “B”, “C”))
> x
[1] A B A C B C
Levels: A B C
> b = x
> b = as.character(b)
> b[7] = “D”
> b = as.factor(b)
> b
[1] A B A C B C D
Levels: A B C D

In this example, we turn the vector into a character vector, add the “D” and turn it back into a factor vector. This is done using the as.character and as.factor functions. The result is that we have successfully added the “D” to the factor vector.

> df = data.frame(A = c(“A”,”B”,”C”,”A”,”B”),
+ B = c(1, 2, 3, 4, 5),
+ stringsAsFactors = FALSE)
> df$A[5] = “D”
> df
A B
1 A 1
2 B 2
3 C 3
4 A 4
5 D 5

In this example, we add an argument to the data frame preventing the character vector from becoming a factor vector. We then add the “D.” The result is that we have successfully added the “D” to the column.

The “Invalid factor level, NA generated” Warning message is an easy problem to get when working with a factor vector. The process that causes this problem works for any other variable type and as such, it is an easy mistake to make. However, it is also easy to fix, though the fix may vary depending on the situation.

Scroll to top
Privacy Policy