When doing a logistic regression, error messages are fairly common. In this case the “error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : na/nan/inf in ‘y’” occurs when using the glm function on a data frame that include a column that is a factor. The good news is it is easy to understand and fix.

### Description of the error

This error message occurs when using the base r glm function which has the basic format of glm(formula, data) but includes additional optional arguments. In this function “formula” is a reference to the data frame columns being evaluated and “data” is the data frame that is being evaluated.

> x = factor(c(1,0,0,1,0,1,1))

> y = y = c(“ape”, “bat”, “cat”,”dog”, “cow”, “horse”, “whale”)

> df = data.frame(x,y)

> df

x y

1 1 ape

2 0 bat

3 0 cat

4 1 dog

5 0 cow

6 1 horse

7 1 whale

> glm(x~y,data=df)

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, :

NA/NaN/Inf in ‘y’

This example code shows an arrangement that causes our error message. Note the presence of the factor function in the x vector.

### Explanation of the error

Here is a code example that eliminates the error message by eliminating the factor function.

> x = c(1,0,0,1,0,1,1)

> y = y = c(“ape”, “bat”, “cat”,”dog”, “cow”, “horse”, “whale”)

> df = data.frame(x,y)

> df

x y

1 1 ape

2 0 bat

3 0 cat

4 1 dog

5 0 cow

6 1 horse

7 1 whale

> glm(x~y,data=df)

Coefficients:

(Intercept) ybat ycat ycow ydog yhorse ywhale

1.000e+00 -1.000e+00 -1.000e+00 -1.000e+00 8.077e-16 4.852e-16 4.852e-16

Degrees of Freedom: 6 Total (i.e. Null); 0 Residual

Null Deviance: 1.714

Residual Deviance: 1.43e-30 AIC: -458.8

As you can see this example only has straight vectors, this is the key to understanding why this error occurs. If the data frame contains one or more factors you will get this error message because it cannot handle factors. Eliminating the factors prevents the error.

### How to fix the error

In this example, we fix the error message while keeping the factor function. This is done by adding one of the optional arguments in the glm function.

> x = factor(c(1,0,0,1,0,1,1))

> y = y = c(“ape”, “bat”, “cat”,”dog”, “cow”, “horse”, “whale”)

> df = data.frame(x,y)

> df

x y

1 1 ape

2 0 bat

3 0 cat

4 1 dog

5 0 cow

6 1 horse

7 1 whale

> glm(formula = x ~ y, family = binomial, data = df)

Coefficients:

(Intercept) ybat ycat ycow ydog yhorse ywhale

2.457e+01 -4.913e+01 -4.913e+01 -4.913e+01 1.102e-06 1.104e-06 1.104e-06

Degrees of Freedom: 6 Total (i.e. Null); 0 Residual

Null Deviance: 9.561

Residual Deviance: 3.001e-10 AIC: 14

As you can see, the addition of the “family = binomial” argument to the glm function fixes the error and thereby prevents the message. So, there are actually two ways of fixing this problem. However, this is the way to fix it that allows you to keep the factor function in your data frame.

While this is an easy error to get, it is also an easy one to fix. It results from simply having one or more factors as columns in your data frame. If you encounter this error message, fixing it is simply a matter of adding an additional argument to the glm function to allow for the use of a binomial vector. Knowing this will help you to avoid the problem and fix it should it occur.