We’re going to walk through how to add and delete columns to a data frame using R. This includes creating calculated fields.
This article continues the examples started in our data frame tutorial. We’re using the ChickWeight data frame example which is included in the standard R distribution. You can easily get to this by typing: data(ChickWeight) in the R console. This data frame captures the weight of chickens that were fed different diets over a period of 21 days. If you can imagine someone walking around a research farm with a clipboard for an agricultural experiment, you’ve got the right idea….
This series has a couple of parts – feel free to skip ahead to the most relevant parts.
- Inspecting your data
- Ways to Select a Subset of Data From an R Data Frame
- Create an R Data Frame
- Sort an R Data Frame
- Add and Remove Columns
- Renaming Columns
- Add and Remove Rows
- Merge Two Data Frames
Adding and Deleting Columns To A Data Frame
Ever wanted to add a calculated field to your data? This could be something like a flag or value bracket indicator (hot, cold, just right) or even a separate calculation combining information from several existing fields.
Continuing our chicken farming example, lets sort our chickens into groups. We’re going to analyze the birds that were measured on the final day and sort them into groups based on weight.
oldbirds <- ChickWeight[ChickWeight$Time==21,] oldbirds$weightclass <- ifelse(oldbirds$weight > 250, c("large"), c("small"))
This little script will create a new field called weightclass and spin through our data frame, using a simple if-then conditional test to assess which rows represent “large” birds and which rows are “small” birds.
Got more than two outcomes? Here’s a way to code that version by “doing it in slices”….
oldbirds$wclass[oldbirds$weight <- 300] <- "Huge" oldbirds$wclass[oldbirds$weight > 200 & oldbirds$weight <= 300] <- "Typical" oldbirds$wclass[oldbirds$weight <= 200] <- "Small"
R extends the length of the data frame with the first assignment statement, creating a column titled “weightclass” and populating rows which meet the condition (weight > 300) with a value of “Huge”. The remaining rows are left blank, eventually being filled as the other statements execute.
This same logic can be applied for a mathematical calculation, where you combine the results of multiple fields to create a new column. For example, lets look at the average weight per days of age for our chickens.
oldbirds$weightperday <- oldbirds$weight / oldbirds$time
In this last example, the formula will be evaluated and applied to each row of the data frame, creating a new column with the calculated amount.
Remove Columns in R (By Name)
Supposed you want to delete columns in an R dataframe by name. You can accomplish this by the simple act of setting that column to NULL, as demonstrated by the code below.
dataframe$columetoremove <- NULL
This approach will set the data frame’s internal pointer to that column to NULL, releasing the space and effectively removing the column from the R data frame. A simple but efficient way to remove columns.
This is actually a very useful technique when working on project code that is potentially shared across multiple team members. It is good form to build checks and audits into your work. Sometimes you may want to incorporate additional calculations and flags into your data frame to validate data. However, over the course of a large project, these QA calculations can add significant overhead to a project (not to mention a huge mess you need to wade through in quality assurance for later steps). Inserting code to remove columns after you need them, before passing the information to the next step, makes life easier for everyone. This is particularly useful when working in industry or consulting, where you may need to put a project on the shelf for several months due to business priorities or recycling code for a new client.
Another Approach: Remove Multiple Columns By Name
A twist on the prior example. If you needed to remove several columns from a data frame, consider using the following snippet.
dataset$firstcol <- dataset$nextcol <- dataset$anothercol <- NULL
This is a clean in-line way to delete columns in R, specially if you have a handful (say three to five columns that you want to drop). Perfect for the clean up example above. You may also want to look at changing column names to ensure the final results are easy to read.
[Reader Update! June 2018] Yet Another Way to Delete Columns in R
Apparently the if-else clause can be used more broadly that we showed in the example above, nesting the if-else operator to support multiple conditions. So we could implement the second example (three level flag) using the following code. This gives us yet another way to delete columns in R.
oldbirds$weightclass <- ifelse(oldbirds$weight > 350, c("Huge"), ifelse(oldbirds$weight > 200, c("Typical"), c("small")))
Next up, how to add and remove rows from a r data frame. Or if you want to skip ahead…