The Best Way to Merge by Different Column Names in R

Merge Two Data Frames in R with Different Columns

Once you get past the “basic contrived examples” and “academic exercises” in R, you’re going to need to know how to combine data frames in R. For example, you may be able to pull sales by customer from a sql database but you need to merge it with the notes from someone’s csv file. And probably match it up against 2 or 3 other files while you’re at it.

By the way, this is pretty much my world as a commercial analyst. There are days where I’m basically a short order cook for data. Knowing how to merge datasets in r is essential to the role.

Storing data under different column names is a common and annoying issue when joining “real world” data sets. Marketing calls it one thing. Engineering another. Finance helpfully contributes a third name for the same field. In the off chance you’re pulling in data from customers and suppliers, it only gets weirder. In some cases this is system generated (pulling from different reports and BI tables). In others, the path is more circuitous. The column names never match…

Merge by Different Column Names in R

In any event, the solution is very easy to implement. Take your existing merge statement (You can add an x and y parameter to your by statement. Example code provided below:

# r merge by different column names in R

> result <- merge (x=stuff, y=moarstuff, by.x=stuffid, by.y=ebayid, x.all=FALSE, y.all=FALSE)

The above will generate an inner join between the two data frames which bridges the little column naming issue.

Presto – you can now merge by different column names in r.

Need more merge / join insights for R?

Check out the rest of our tutorials: