R warning message - glm.fit: fitted probabilities numerically 0 or 1 occurred

So the good news is…. your proposed logistic regression model is very, very good at predicting the response variable. Suspiciously good. To the point of being problematic, since a logistic regression model rarely predicts the other that perfectly without cheating. This will trigger a warning message in R.

Description of the glm.fit Warning

The “glm.fit: fitted probabilities numerically 0 or 1 occurred” warning message occurs when the predicted probabilities of a glm logistic regression model are too good. This function uses fitted probabilities of a data frame to allow the prediction of other values. If those probabilities are too perfect, that is one and zero, it suggests that something is wrong with the data. The result is a warning. The irony of this warning message is that results from your data being too good, therefore it causes a warning and not an error message. This way, the program can continue to run and supply the output while letting you know there is a potential problem.

Explanation of the warning

When doing a logistic regression, the model works on probabilities. When the probabilities are exactly one and zero, the data looks suspiciously good to the function. As a result, the function triggers a warning message to alert you to the potential problem.

> x = c(5,-2,6,-7,8,-4,3,-5,2,-1)
> y = c(1,0,1,0,1,0,1,0,1,0)
> df = data.frame(x, y)
> glm(y ~ x, df, family = “binomial”)

Call: glm(formula = y ~ x, family = “binomial”, data = df)

Coefficients:
(Intercept) x
-7.725 15.450

Degrees of Freedom: 9 Total (i.e. Null); 8 Residual
Null Deviance: 13.86
Residual Deviance: 3.443e-10 AIC: 4
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

In this example, the explanatory variable is a perfect predictor of the response variable. As a result, it triggers the warning message, making it a perfect example of this warning message.

How to fix the warning

Ironically, you may not actually WANT to fix this warning. I mean, your model is perfect right? You predict the outcome with 100% accuracy.

Consider this a polite suggestion to check your program and underlying data to make sure you haven’t made a broader mistake in your data prep.

If you truly intend to fix this warning, include an imperfection in the data being analyzed. Now because the program still runs and provides an output, you do have the option of simply tolerating the warning message.

> x = c(5,-2,6,7,8,-4,3,-5,2,-1)
> y = c(1,0,1,0,1,0,1,0,1,0)
> df = data.frame(x, y)
> glm(y ~ x, df, family = “binomial”)

Call: glm(formula = y ~ x, family = “binomial”, data = df)

Coefficients:
(Intercept) x
-0.8435 0.4077

Degrees of Freedom: 9 Total (i.e. Null); 8 Residual
Null Deviance: 13.86
Residual Deviance: 8.918 AIC: 12.92

It is important to note, as illustrated in this example, fixing this warning does not require a complete separation between the explanatory variable and the response variable. In this example, we simply change a negative seven to a positive seven. This one change disrupts the pattern just enough to avoid the warning.

This warning simply lets you know that the probabilities that the glm function is getting from your data are all ones and zeros. In other words, the predictive nature of your data is perfect which is an unusual situation. Therefore, the warning is triggered. Furthermore, eliminating the warning simply requires adding an imperfection to the data. It is most likely to occur when producing test or demonstration data frames. This is because real-world data is hardly ever likely to be that good. However, when creating material for demonstration or testing purposes you might encounter this situation. Fortunately, once you understand why this warning occurs, you can avoid it.