When doing linear regressions, a model frame may have multiple columns. In such cases, you may want to do a regression on only one, or regressions on all of them. Unfortunately, if one of those columns has all missing values you will get the “error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) : 0 (non-na) cases” error message if you do a regression on it. Fortunately, it is an easy problem to fix.
Description of the process
Here we demonstrate using a base R function doing a linear regression using all of the columns in a dataframe.
> t = as.numeric(Sys.time())
> set.seed(t)
> df = data.frame(y = rnorm(10),
+ x = rnorm(10),
+ z = rnorm(10))
> df
y x z
1 -0.19323667 0.97373349 0.47965006
2 -0.30155848 0.69689923 0.08563622
3 -1.27467920 1.58765094 -0.46997593
4 1.66891511 -0.01964070 0.13134181
5 -0.66454622 0.15755350 0.09420765
6 -0.86709293 0.42064576 -0.91947970
7 -0.62145694 -1.01818482 1.49554708
8 0.49335731 1.40569659 1.23552949
9 0.01849929 -0.08046865 0.13269836
10 -0.46315189 0.78309382 0.18512502
>
> lm(y ~ ., df)
Call:
lm(formula = y ~ ., data = df)
Coefficients:
(Intercept) x z
-0.26135 -0.07714 0.32123
To accomplish this, we use the lm function with only one column name in the formula argument. It does not encounter any problems and so it processes all of the columns properly. If one of the columns had all NA values, it would have produced our error message.
Explanation of the error
This is an example of repeated measures in a series of linear models where one has nothing but missing values. Note that two of the dataframe’s columns still have normal numeric values.
> t = as.numeric(Sys.time())
> set.seed(t)
> df = data.frame(y = rnorm(10),
+ x = rnorm(10),
+ z = NA)
> df
y x z
1 -0.19323667 0.97373349 NA
2 -0.30155848 0.69689923 NA
3 -1.27467920 1.58765094 NA
4 1.66891511 -0.01964070 NA
5 -0.66454622 0.15755350 NA
6 -0.86709293 0.42064576 NA
7 -0.62145694 -1.01818482 NA
8 0.49335731 1.40569659 NA
9 0.01849929 -0.08046865 NA
10 -0.46315189 0.78309382 NA
> lm(y ~ ., df)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, …) :
0 (non-NA) cases
Note that column Z has nothing but NA values, which is what this message targets. The lm function is calling both columns X and Z and Z has nothing but meaningless values therefore a linear regression is impossible resulting in our error message.
> t = as.numeric(Sys.time())
> set.seed(t)
> df = data.frame(y = rnorm(10),
+ x = rnorm(10),
+ z = c(1, 2, 3, NA, 4, 5, 6, 7, 8, 9))
> df
y x z
1 0.8453196 0.8663052 1
2 0.4430680 1.1835039 2
3 0.1926877 -1.5038923 3
4 1.0258648 0.3770533 NA
5 1.4514865 0.3370155 4
6 -0.8430806 -0.9882684 5
7 0.3953997 1.3818584 6
8 0.9177905 -1.9618789 7
9 0.6577815 -1.9513699 8
10 0.1132665 0.2029384 9
>
> lm(y ~ ., df)
Call:
lm(formula = y ~ ., data = df)
Coefficients:
(Intercept) x z
0.58965 0.04365 -0.02282
Note that in this example, column Z has a missing value, but it also has numeric values. This shows the key to this problem is a column with nothing but missing values.
How to fix the error
Fixing this problem is simple. All you need to do is set up the lm function in a manner that only tries to do a model regression on the variable that contains numeric values. This linear model skips the variable with the missing values and proceeds as normal producing the results for the single column.
> t = as.numeric(Sys.time())
> set.seed(t)
> df = data.frame(y = rnorm(10),
+ x = rnorm(10),
+ z = NA)
> df
y x z
1 -0.19323667 0.97373349 NA
2 -0.30155848 0.69689923 NA
3 -1.27467920 1.58765094 NA
4 1.66891511 -0.01964070 NA
5 -0.66454622 0.15755350 NA
6 -0.86709293 0.42064576 NA
7 -0.62145694 -1.01818482 NA
8 0.49335731 1.40569659 NA
9 0.01849929 -0.08046865 NA
10 -0.46315189 0.78309382 NA
> lm(y ~ x, df)
Call:
lm(formula = y ~ x, data = df)
Coefficients:
(Intercept) x
-0.1402 -0.1636
Note in this example, the lm function refers to both column Y and column X and as such, it is not calling column Z. The solution to this problem is skipping the column that is causing the problem. This makes fixing this problem a simple process.
You are most likely to encounter this error message when you do not have any control over the data because otherwise, you can just delete the offending column to prevent it. This is however an extremely easy problem to fix and one that you do not have to worry about once you understand it.