R Errors: error in terms.formula(formula, data = data) : invalid model formula in extractvars

Just remember if you are dealing with an error message, it means you are actually programming. In some cases, they can result from some of the simplest mistakes. The “error in terms.formula(formula, data = data) : invalid model formula in extractvars” error message is an example of this. It results from a simple error because of a difference in how data frame columns are being referred to in a particular function.

The circumstances of this error.

This error message occurs when you are using the rpart() function. It results from a miss-formatting the column names in the data frame when you refer to them in the rpart() function.

> library(“rpart”)
> x = rep(1, 4)
> y = rep(2, 4)
> z = rep(3, 4)
> t = rep(4, 4)
> df = data.frame(x,y,z,t)
> df
x y z t
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 1 2 3 4
> q = rpart( t ~ “x” + “y”+”z”, data = df)
Error in terms.formula(formula, data = data) :
invalid model formula in ExtractVars

As you can see t, “x”, “y” and “z” refer to the column names in the data frame df.

What is causing this error?

The reason why this error message occurs is not properly formatting calling games for using in the rpart() function. The mistake here is putting quotes around call names of the data frame.

> library(“rpart”)
> x = rep(1, 4)
> y = rep(2, 4)
> z = rep(3, 4)
> t = rep(4, 4)
> df = data.frame(x,y,z,t)
> df
x y z t
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 1 2 3 4
> q = rpart(t ~ x + y + z, data = df)

In this example, the column names in data frame df are not contained in quotes, and the result is that there is no error message. This shows that the cause of the error message was putting the column names in quotes.

How to fix this error.

This problem has two ways of fixing it. The goal of both of these methods is the correct formatting of the column names so that they are acceptable to the rpart() function.

q = rpart(t ~ “x” + “y” + “z”, data = df)

This is the version of the rpart() function containing the quotes around the column names. It is also the one that produces the error message.

q = rpart(t ~ x + y + z, data = df)

In this case, we are simply eliminating the quotes to produce a format for the column names that is acceptable to the rpart() function.

q = rpart(df$t ~ df$x + df$y + df$z, data = df)

The second solution involves using the standard notation for accessing the columns in a data frame. It eliminates the error by using a more conventional format. This approach should be the simplest one to remember since you use it elsewhere.