How To Do a Left Join in R on Multiple Columns (Using the Merge Function)

R’s data types are among the language’s most impressive features. This stems in part from how well-suited structures like data frames are to advanced mathematical functions. But there’s another important aspect to R’s variables. They’re not just powerful, they’re also extremely flexible.

R essentially gives us the power of a full programming language with the information management capabilities of databases. In fact, we can even use fairly analogous techniques to what’s found in SQL. Take joins for example. When you’re performing a database query with SQL you’ll often want to organize the results by a particular index value. We can also do something similar with R’s merge function. And you’ll soon see just how easily we can perform a left join on multiple columns in R.

R, Data Frames, Joins, and Merge

If you’re familiar with left joins within databases then you might be surprised to see it come up in the context of a programming language. Or at least within the language’s default lexicon. This stands as a testament to just how well-suited R is to data analysis and manipulation. R variables can essentially function as two-dimensional databases. And they can be manipulated in a similar way. This even includes using joins on multiple data frames to bring them together into a singular whole. Or, in other words, to merge multiple R entities into a single variable.

The most significant point when considering these types of operations is how semi-compatible data sets are going to come together. For example, we’ll often find ourselves in situations where we have more categories in one dataset than the other. We deal with this in part by defining a central element to organize our data around. But we also need to orient data around a spatial component such as left or right frames. When we orient the merge around the left frame then it’s considered a left join as the leftmost element sets the pattern followed by the rest of the data. This can sound like an intimidatingly complex process. But R’s merge function makes joins a comparatively easy process.

Performing a Simple Left Join With Merge

The best way to see how R handles joins is to simply jump in and see it for yourself. Try running the following example.

ourData1 <- data.frame(
id = c(1, 2, 3),
name1 = c(“Velociraptor”, “Triceratops”, “Brachiosaurus”)
)

ourData2 <- data.frame(
id = c(1, 2, 3, 4),
researcherInterest = c(30, 50, 80, 20),
averageInterest = c(90, 100, 1, 1)
)

ourMergedFrame <- merge(ourData1, ourData2, by = “id”, all.x = TRUE)
print(ourMergedFrame)

We begin by taking on the role of a programmer tasked with charting both public and professional interest in various dinosaurs. Our first data frame, ourData1, consists of id numbers and dinosaur names. The ourData2 data frame consists of id numbers, research interest, and the general public’s average interest. Our id numbers are all in order and interest is ranked between 1 and 100. However, note that ourData2 tracks four entities while ourData1 only considers three. This is where the left join will work its magic.

We begin by creating the ourMergedFrame variable to hold the result of the merge. And we create that information by simply passing four arguments. The first two are the data frames that will be merged. Next up is the by argument. By tells merge which element to join the data around. This is essentially the focus of our join operation.

Next, we pass all.x = TRUE to merge. The .x specifies the left position. As such we’re keeping all of the rows from ourData1 even if there isn’t an equivalent matching row in ourData2. Conversely, when data is present in ourData2 but no corresponding element is in ourData1 it’ll be ignored. This is because we’re joining on our left – ourData1. And when we print the result we can see that we are indeed missing the additional element found in ourData2.

Note that we can change the nature of the join by simply changing the .x to .y. This will make it a right join and we’ll have the missing value from ourData2. And if we entirely removed the “all.” argument the merge function would decide what to merge based on whether there’s a match in both frames. This results in what’s known as an inner join.

Moving Up to Tables

Now that you have the basics down let’s take a look at a slightly more advanced form of left join. Try running the following code.

ourData1 <- data.frame(
id = c(1, 2, 3),
name1 = c(“Velociraptor”, “Triceratops”, “Brachiosaurus”)
)

ourData2 <- data.frame(
id = c(1, 2, 3, 4),
researcherInterest = c(30, 50, 80, 20),
averageInterest = c(90, 100, 1, 1)
)

ourLookupTable <- data.frame(id = c(1, 2, 3), name2 = c(“Bob’s Favorite”, “Bill’s favorite”, “Becky’s favorite”))
merged1 <- merge(ourData1, ourLookupTable, by = “id”, all.x = TRUE)
ourMergedFrame <- merge(merged1, ourData2, by = “id”, all.x = TRUE)
print(ourMergedFrame)

We’re going through a fairly similar process when compared to the first example. And this too is a left join. But it’s a left join that uses a lookup table. This might sound complex, but it’s essentially just an additional data frame that can provide extra context for existing data. In this case, we’re adding a name2 that shows who nominated a dinosaur for consideration in our hypothetical analysis. Our other data frame declarations are constructed just as they were in the previous example. But we also create a lookup table in the form of a new data frame called ourLookupTable. This table contains the additional name values.

Next, we perform an initial left join using ourData1 and the table. We then use that modified form of ourData1 as our leftmost point for an additional merge. This essentially stacks the joins on top of each other. We’re still using the same dynamic between the left data frame and the right data frame. It’s just that we’ve expanded the contents of the left before finalizing the join process. We can print out the results now that we have everything in place. And we now see an additional name2 column showing who nominated each dinosaur.

Scroll to top
Privacy Policy