If you have not encountered error messages, you are not writing programs. Even the simplest ones will turn up their messages from time to time. This error in fix.by(by.y, y) occurs when merging two data frames.
Circumstances of this error message.
This error occurs merging two data frames around a column that is only in one of the data frames. You can make this error by accidentally copying the wrong column heading. Here is an example of a code that produces this error.
# triggering Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
> df = read.table(text = '
+ Trip Distance Duration
+ 1 "#3" 2027.072 081.082
+ 2 "#4" 3476.802 139.072
+ 3 "#5" 4622.174 184.886
+ 4 "#6" 4214.416 168.576
+ 5 "#7" 6553.586 262.143
+ 6 "#8" 7123.162 284.926
+ 7 "#9" 7987.369 319.494', header=TRUE)
>
> de = read.table(text = '
+ Trip Distance Start
+ 1 "#3" 2027.072 1.082
+ 2 "#4" 3476.802 9.072
+ 3 "#5" 4622.174 4.886
+ 4 "#6" 4214.416 8.576
+ 5 "#7" 6553.586 2.143
+ 6 "#8" 7123.162 4.926
+ 7 "#9" 7987.369 9.494', header=TRUE)
>
> merge(x = de, y = df, by = "Start")
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
The column label “Start” is only found in the data frame “de.” Consequently, trying to merge them around the “Start” column will not work. Unfortunately, the description produced by the error message is not particularly helpful at helping a programmer see what the problem actually is. Despite this little problem, the cause of this error message is easy to understand. You are trying to merge two data frames based on a column that they do not have in common.
How to fix this error message.
Fixing this error message is simple. All you need to do is make sure did the column name that you equate to “by” is in both data frames. It is that simple to fix. Here is a repeat of the same code (using read.table()) as before only done properly.
# solving Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
> df = read.table(text = '
+ Trip Distance Duration
+ 1 "#3" 2027.072 081.082
+ 2 "#4" 3476.802 139.072
+ 3 "#5" 4622.174 184.886
+ 4 "#6" 4214.416 168.576
+ 5 "#7" 6553.586 262.143
+ 6 "#8" 7123.162 284.926
+ 7 "#9" 7987.369 319.494', header=TRUE)
>
> de = read.table(text = '
+ Trip Distance Start
+ 1 "#3" 2027.072 1.082
+ 2 "#4" 3476.802 9.072
+ 3 "#5" 4622.174 4.886
+ 4 "#6" 4214.416 8.576
+ 5 "#7" 6553.586 2.143
+ 6 "#8" 7123.162 4.926
+ 7 "#9" 7987.369 9.494', header=TRUE)
>
> merge(x = de, y = df, by = "Trip")
Trip Distance.x Start Distance.y Duration
1 #3 2027.072 1.082 2027.072 81.082
2 #4 3476.802 9.072 3476.802 139.072
3 #5 4622.174 4.886 4622.174 184.886
4 #6 4214.416 8.576 4214.416 168.576
5 #7 6553.586 2.143 6553.586 262.143
6 #8 7123.162 4.926 7123.162 284.926
7 #9 7987.369 9.494 7987.369 319.494
As you can see when you use “Trip” the merge works. It does so because the column “Trip” is in both data frames.
For: error in fix.by(by.y, y) : ‘by’ must specify a uniquely valid column