error in, y) : ‘by’ must specify a uniquely valid column

If you have not encountered error messages, you are not writing programs. Even the simplest ones will turn up their messages from time to time. This error in, y) occurs when merging two data frames.

Circumstances of this error message.

This error occurs merging two data frames around a column that is only in one of the data frames. You can make this error by accidentally copying the wrong column heading. Here is an example of a code that produces this error.

# triggering Error in, y) : 'by' must specify a uniquely valid column 
> df = read.table(text = '
 +       Trip  Distance  Duration
 + 1     "#3"  2027.072  081.082
 + 2     "#4"  3476.802  139.072
 + 3     "#5"  4622.174  184.886
 + 4     "#6"  4214.416  168.576
 + 5     "#7"  6553.586  262.143
 + 6     "#8"  7123.162  284.926
 + 7     "#9"  7987.369  319.494', header=TRUE)
 > de = read.table(text = '
 +       Trip  Distance  Start
 + 1     "#3"  2027.072  1.082
 + 2     "#4"  3476.802  9.072
 + 3     "#5"  4622.174  4.886
 + 4     "#6"  4214.416  8.576
 + 5     "#7"  6553.586  2.143
 + 6     "#8"  7123.162  4.926
 + 7     "#9"  7987.369  9.494', header=TRUE)
 > merge(x = de, y = df, by = "Start")
 Error in, y) : 'by' must specify a uniquely valid column

The column label “Start” is only found in the data frame “de.” Consequently, trying to merge them around the “Start” column will not work. Unfortunately, the description produced by the error message is not particularly helpful at helping a programmer see what the problem actually is. Despite this little problem, the cause of this error message is easy to understand. You are trying to merge two data frames based on a column that they do not have in common.

How to fix this error message.

Fixing this error message is simple. All you need to do is make sure did the column name that you equate to “by” is in both data frames. It is that simple to fix. Here is a repeat of the same code (using read.table()) as before only done properly.

# solving Error in, y) : 'by' must specify a uniquely valid column
> df = read.table(text = '
 +       Trip  Distance  Duration
 + 1     "#3"  2027.072  081.082
 + 2     "#4"  3476.802  139.072
 + 3     "#5"  4622.174  184.886
 + 4     "#6"  4214.416  168.576
 + 5     "#7"  6553.586  262.143
 + 6     "#8"  7123.162  284.926
 + 7     "#9"  7987.369  319.494', header=TRUE)
 > de = read.table(text = '
 +       Trip  Distance  Start
 + 1     "#3"  2027.072  1.082
 + 2     "#4"  3476.802  9.072
 + 3     "#5"  4622.174  4.886
 + 4     "#6"  4214.416  8.576
 + 5     "#7"  6553.586  2.143
 + 6     "#8"  7123.162  4.926
 + 7     "#9"  7987.369  9.494', header=TRUE)
 > merge(x = de, y = df, by = "Trip")
   Trip Distance.x Start Distance.y Duration
 1   #3   2027.072 1.082   2027.072   81.082
 2   #4   3476.802 9.072   3476.802  139.072
 3   #5   4622.174 4.886   4622.174  184.886
 4   #6   4214.416 8.576   4214.416  168.576
 5   #7   6553.586 2.143   6553.586  262.143
 6   #8   7123.162 4.926   7123.162  284.926
 7   #9   7987.369 9.494   7987.369  319.494

As you can see when you use “Trip” the merge works. It does so because the column “Trip” is in both data frames.

For: error in, y) : ‘by’ must specify a uniquely valid column

Scroll to top
Privacy Policy