Fixing R error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : na/nan/inf in 'x'

This error message can occur when you are doing a linear model on a data frame and it contains infinite, not a number, and missing values. Ironically fixing it requires changing the infinite, and not a number values to missing values. This is an easy problem to understand and fix, but at first glance, the solution is a little unexpected.

Description of the error

The lm function is used when doing a linear model in the R programming language. It seeks to find a relationship between one or more independent variables and a dependent variable so that the dependent variable can be estimated for other values of the independent variables. This error message occurs when the dataframe contains infinite, not a number, and missing values.

NA – missing values
NaN – not a number values
Inf – infinite values

The problem occurs because this function works with numeric values and cannot work with Inf, NaN, and NA values. This function is, however, good at ignoring NA values. As a result of this situation, the best approach to fixing this problem is replacing the Inf and NaN values with NA values. While it may not be intuitive, this solution not only works but it can be done automatically.

Explanation of the R error

Here is a code example that shows what causes this problem.

> df1 = data.frame(y = c(1,2,3,4,5,6,7),
+ x = c(1,NA,3,Inf,5,NaN,7))
> df1
y x
1 1 1
2 2 NA
3 3 3
4 4 Inf
5 5 5
6 6 NaN
7 7 7
> lm(y ~ x, df1)
Error in lm.fit (x, y, offset = offset, singular.ok = singular.ok, …) :
NA/NaN/Inf in ‘x’

As you can see in the example data the “x” column contains NA, Inf, and NaN values. When it is put through the linear model, it produces our message.

How to fix the R error

Here is a code example that shows how to fix this problem.

> df1 = data.frame(y = c(1,2,3,4,5,6,7),
+ x = c(1,NA,3,Inf,5,NaN,7)))
Error: unexpected ‘)’ in:
“df1 = data.frame(y = c(1,2,3,4,5,6,7),
x = c(1,NA,3,Inf,5,NaN,7)))”
> df1
y x
1 1 1
2 2 NA
3 3 3
4 4 Inf
5 5 5
6 6 NaN
7 7 7
> df2 = df1
> df2[is.na(df2) | df2 == “Inf”] = NA
> df2
y x
1 1 1
2 2 NA
3 3 3
4 4 NA
5 5 5
6 6 NA
7 7 7
> lm(y ~ x, df2)

Call:
lm(formula = y ~ x, data = df2)

Coefficients:
(Intercept) x
0 1

As you can see, once again, in the example data the “x” column contains NaN, Inf, and NA values. However, this time we convert them all into NA values. As a result, when it is put through the linear model, it does not produce our message. This is because the lm function simply ignores rows containing NA values. Consequently, the formula produces the modeling coefficients that it is supposed to. The problem has been fixed, even if it is in an unexpected manner.

The reason for this error message is straightforward enough that it is easy to understand. The message itself is clear as to what the problem is. It is, however, tricky to see what the solution is because it is a little counterintuitive. This is because it is the presence of the NaN, Inf, and NA values that cause the problem in the first place. Because of this, the fix is a little counterintuitive because you would not expect to see any one of these values going through the lm function without any problems. However, once you understand the problem and how to fix it, it becomes an easy problem to deal with.