R Errors Explained: Undefined Columns Selected R

If you are getting error messages every now and then, it is a sign that you are actually writing programs. The “undefined columns selected” error message can show up when you are creating a subset of data from within a data frame. In this case, the message gives you a clue as to what is going on, but it is evident only once you understand it.

The circumstances of this error.

The circumstances under which this error occurs is the process of creating a subset of data from a data frame. It is actually a rather small error in formatting the function that is producing the subset.

> a = read.table(text = ‘
+ Distance Duration
+ 1 20.072 81.082
+ 2 34.802 39.072
+ 3 46.174 84.886
+ 4 42.416 68.576
+ 5 65.586 62.143
+ 6 71.162 84.926
+ 7 79.369 19.494′, header=TRUE)

> a[a$Distance > 40 ]
Error in `[.data.frame`(a, a$Distance > 40) : undefined columns selected

This code produces the “undefined columns selected” error message because the function “a[a$Distance > 40 ]” is looking for more import.

What is causing this error?

The reason you get an error message in this situation is that the formula is not complete. The proper formatting of the formula is dataframe[condition(dataframe$column), column#] and the above example is in the format of dataframe[condition(dataframe$column)]. This creates an error, resulting in our error message.

> a = read.table(text = ‘
+ Distance Duration
+ 1 20.072 81.082
+ 2 34.802 39.072
+ 3 46.174 84.886
+ 4 42.416 68.576
+ 5 65.586 62.143
+ 6 71.162 84.926
+ 7 79.369 19.494′, header=TRUE)

> a[a$Distance > 40, 2 ]
[1] 84.886 68.576 62.143 84.926 19.494

This example produces the values in column 2 (Duration) based on the conditions met in column 1 (Distance). This shows that the error message resulted from the function not getting all of the data it was looking for.

How to fix this error.

You are most likely to get the error message when you want all of the columns in the data frame. It is natural to expect that using the format of dataframe[condition(dataframe$column)] will give you all the columns, but instead, it produces the “undefined columns selected” error message.

> a = read.table(text = ‘
+ Distance Duration
+ 1 20.072 81.082
+ 2 34.802 39.072
+ 3 46.174 84.886
+ 4 42.416 68.576
+ 5 65.586 62.143
+ 6 71.162 84.926
+ 7 79.369 19.494′, header=TRUE)

> a[a$Distance > 40, ]
Distance Duration
3 46.174 84.886
4 42.416 68.576
5 65.586 62.143
6 71.162 84.926
7 79.369 19.494

As you can see in this example of the correct format for the function is dataframe[condition(dataframe$column), ] if you want all the columns in the data frame. The result is a data frame that contains the rows that meet the conditions in the selected column. Otherwise, you need to select a specific column that you want to be in the new data frame.