Fixing the R error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-na) cases

You will get this error message when you use the linear regression r function If the data frame has a column of all missing values as an independent variable. Fixing it requires removing that column from the modeling because the modeling uses applied statistics. This works only for numerical values and not na values.

Description of the R Error

The lm function is a linear regression model. This message is produced because the function cannot handle a model that has an entire column of missing values. When this linear model includes a column of missing values, it triggers our error message. This occurs because linear models are not intended to work with missing values and it usually skips those rows. When the fitted values of the model frame are numeric, they can form a model matrix. These statistical models are useful for regression diagnostics producing a linear predictor value. The fitting is done by using the original input values. This also allows the creation of a variance table showing which of the original terms are residuals or a useful offset term.

Explanation of the R Error

Here we have two examples illustrating what causes this message.

> set.seed(314159)
> df = data.frame(y = rnorm(10),
+ x = rnorm(10),
+ z = NA)
> df
y x z
1 -0.79928868 -1.6354905 NA
2 -0.73009689 0.8243093 NA
3 1.43687692 -0.9979524 NA
4 0.30502316 -1.6794770 NA
5 -0.39728401 -0.2965286 NA
6 0.08889039 -1.3458833 NA
7 1.16870410 1.1970303 NA
8 -0.77574378 1.1022340 NA
9 -1.75551631 -1.1513109 NA
10 -2.30219147 -1.1602625 NA
> lm(y ~ ., df)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) :
0 (non-NA) cases

In this example, we use a generalized linear model for the fitting process. Now you will note that column Z is set to a na value. When using generalized linear models, the lm function triggers our message because it cannot go onto the prediction interval.

> set.seed(314159)
> df = data.frame(y = rnorm(10),
+ x = rnorm(10),
+ z = NA)
> df
y x z
1 -0.79928868 -1.6354905 NA
2 -0.73009689 0.8243093 NA
3 1.43687692 -0.9979524 NA
4 0.30502316 -1.6794770 NA
5 -0.39728401 -0.2965286 NA
6 0.08889039 -1.3458833 NA
7 1.16870410 1.1970303 NA
8 -0.77574378 1.1022340 NA
9 -1.75551631 -1.1513109 NA
10 -2.30219147 -1.1602625 NA
> lm(y ~ z, df)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) :
0 (non-NA) cases

In this example, the independent variable is set to Z. As a result our message is triggered, this shows that the Z column is the offending variable.

How to fix the R error

Here is an example of how to fix this problem.

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept) x
-0.2859 0.1753

In this example, Z is still a column of missing values, but it is excluded from the modeling of the lm function. As a result, the repeated measures are able to properly model the series to be able to provide predictions for the dependent variable. This fixes the problem, but it would not be needed if there were any numerical values in column Z.

Each of the data frames columns is a response vector, and that makes it possible to use a distributional regression. The lm function is this type of regression but it cannot handle an entire column of missing values. This error message can be tricky to diagnose, particularly in the case of a large data frame. However, the actual solution is easy because it is a simple matter of excluding that column from the modeling. Once you understand this problem, you will find it easy to fix, the key is proper diagnosis.