We’re going to walk through how to add and delete columns to a data frame using R. This includes creating calculated fields.
This article continues the examples started in our data frame tutorial. We’re using the ChickWeight data frame example which is included in the standard R distribution. You can easily get to this by typing: data(ChickWeight) in the R console. This data frame captures the weight of chickens that were fed different diets over a period of 21 days. If you can imagine someone walking around a research farm with a clipboard for an agricultural experiment, you’ve got the right idea….
This series has a couple of parts – feel free to skip ahead to the most relevant parts.
- Inspecting your data
- Ways to Select a Subset of Data From an R Data Frame
- Create an R Data Frame
- Sort an R Data Frame
- Add and Remove Columns
- Renaming Columns
- Add and Remove Rows
- Merge Two Data Frames
Adding and Deleting Columns To A Data Frame
Ever wanted to add a calculated field to your data? This could be something like a flag or value bracket indicator (hot, cold, just right) or even a separate calculation combining information from several existing fields.
Continuing our chicken farming example, lets sort our chickens into groups. We’re going to analyze the birds that were measured on the final day and sort them into groups based on weight.
oldbirds <- ChickWeight[ChickWeight$Time==21,] oldbirds$weightclass <- ifelse(oldbirds$weight > 250, c("large"), c("small"))
This little script will create a new field called weightclass and spin through our data frame, using a simple if-then conditional test to assess which rows represent “large” birds and which rows are “small” birds.
Got more than two outcomes? Here’s a way to code that version by “doing it in slices”….
oldbirds$wclass[oldbirds$weight > 300] <- "Huge" oldbirds$wclass[oldbirds$weight > 200 & oldbirds$weight <= 300] <- "Typical" oldbirds$wclass[oldbirds$weight <= 200] <- "Small"
R extends the length of the data frame with the first assignment statement, creating a column titled “weightclass” and populating rows which meet the condition (weight > 300) with a value of “Huge”. The remaining rows are left blank, eventually being filled as the other statements execute.
This same logic can be applied for a mathematical calculation, where you combine the results of multiple fields to create a new column. For example, lets look at the average weight per days of age for our chickens.
oldbirds$weightperday <- oldbirds$weight / oldbirds$time
In this last example, the formula will be evaluated and applied to each row of the data frame, creating a new column with the calculated amount.
Reader Update! [June 2018]
Apparently the if-else clause can be used more broadly that we showed in the example above, nesting the if-else operator to support multiple conditions. So we could implement the second example (three level flag) using the following code.
oldbirds$weightclass <- ifelse(oldbirds$weight > 350, c("Huge"), ifelse(oldbirds$weight > 200, c("Typical"), c("small")))
Next up, how to add and remove rows from a r data frame. Or if you want to skip ahead…