We’re going to walk through how to add and delete columns to a data frame using R. This includes creating calculated fields.
This article continues the examples started in our data frame tutorial. We’re using the ChickWeight data frame example which is included in the standard R distribution. You can easily get to this by typing: data(ChickWeight) in the R console. This data frame captures the weight of chickens that were fed different diets over a period of 21 days. If you can imagine someone walking around a research farm with a clipboard for an agricultural experiment, you’ve got the right idea….
This series has a couple of parts – feel free to skip ahead to the most relevant parts.
- Inspecting your data
- Ways to Select a Subset of Data From an R Data Frame
- Create an R Data Frame
- Sort an R Data Frame
- Add and Remove Columns
- Renaming Columns
- Add and Remove Rows
- Merge Two Data Frames
Adding and Deleting Columns To A Data Frame
Ever wanted to add a calculated field to your data? This could be something like a flag or value bracket indicator (hot, cold, just right) or even a separate calculation combining information from several existing fields.
Continuing our chicken farming example, lets sort our chickens into groups. We’re going to analyze the birds that were measured on the final day and sort them into groups based on weight.
# add and delete column in r examples # sort initial birds into groups by weight (before adding column) oldbirds <- ChickWeight[ChickWeight$Time==21,] oldbirds$weightclass <- ifelse(oldbirds$weight > 250, c("large"), c("small"))
This little script will create a new field called weightclass and spin through our data frame, using a simple if-then conditional test to assess which rows represent “large” birds and which rows are “small” birds.
Got more than two outcomes? Here’s a way to code that version by “doing it in slices”….
# add and delete column in r examples # adding column for categorical variable oldbirds$wclass[oldbirds$weight <- 300] <- "Huge" oldbirds$wclass[oldbirds$weight > 200 & oldbirds$weight <= 300] <- "Typical" oldbirds$wclass[oldbirds$weight <= 200] <- "Small"
R extends the length of the data frame with the first assignment statement, creating a column titled “weightclass” and populating rows which meet the condition (weight > 300) with a value of “Huge”. The remaining rows are left blank, eventually being filled as the other statements execute.
This same logic can be applied for a mathematical calculation, where you combine the results of multiple fields to create a new column. For example, lets look at the average weight per days of age for our chickens.
# add and delete columns in R examples # add column for mathematical calculation oldbirds$weightperday <- oldbirds$weight / oldbirds$time
In this last example, the formula will be evaluated and applied to each row of the data frame, creating a new column with the calculated amount.
How to remove a column in r
Supposed you want to drop columns in an R dataframe by name. You can accomplish this by the simple act of setting that column to NULL, as demonstrated by the code below.
# how to remove a column in r / delete column in R # this version will remove column in r by name dataframe$columetoremove <- NULL
This approach will set the data frame’s internal pointer to that column to NULL, releasing the space and will remove the column from the R data frame. A simple but efficient way to drop columns.
This is actually a very useful technique when working on project code that is potentially shared across multiple team members. It is good form to build checks and audits into your work. Sometimes you may want to incorporate additional calculations and flags into your data frame to validate data. However, over the course of a large project, these QA calculations can add significant overhead to a project (not to mention a huge mess you need to wade through in quality assurance for later steps). Inserting code to remove columns after you need them, before passing the information to the next step, makes life easier for everyone.
Another Approach: Remove Multiple Columns By Name
A twist on the prior example. If you needed to remove several columns from a data frame, consider using the following snippet.
# delete multiple columns in r # delete column in R by mapping Null value to them dataset$firstcol <- dataset$nextcol <- dataset$anothercol <- NULL
This can be used for removing columns in R, especially if you need to run “drop columns” on three to five at a time. Better yet, since the underlying operation (remove column in r by name) is very transparent, it will be easy for others to understand your code. You may also want to look at changing column names to ensure the final results are easy to read.
[Reader Update! June 2018] Yet Another Way to Delete Columns in R
Apparently the if-else clause can be used more broadly that we showed in the example above, nesting the if-else operator to support multiple conditions. So we could implement the second example (three level flag) using the following code. This gives us yet another way to delete columns in R.
# how to remove a column in r / delete column in R oldbirds$weightclass <- ifelse(oldbirds$weight > 350, c("Huge"), ifelse(oldbirds$weight < 200, c("Typical"), c("small")))
Next up, how to add and remove rows from a r data frame. Or if you want to skip ahead…