We’re going to walk through how to add and drop column values in R. This includes creating calculated fields. Learning how to delete duplicate rows and columns from your dataset file is essential to good data analysis in R programming, so we are going to teach you how to drop rows and columns. Whether it is a missing value or duplicates in your dataframe column or table column, this worksheet function will fix all of your dataframe column needs, without needing the dplyr package.
This article continues the examples started in our data frame tutorial . We’re using the ChickWeight data frame example which is included in the standard R distribution. You can easily get to this by typing: data(ChickWeight) in the R console. This data frame captures the weight of chickens that were fed different diets over a period of 21 days. If you can imagine someone walking around a research farm with a clipboard for an agricultural experiment, you’ve got the right idea….
This series has a couple of parts – feel free to skip ahead to the most relevant parts.
- Inspecting your data
- Ways to Select a Subset of Data From an R Data Frame
- Create an R Data Frame
- Sort an R Data Frame
- Add and Remove Columns
- Renaming Columns
- Add and Remove Rows
- Merge Two Data Frames
Adding and Deleting Columns From A Data Frame
Ever wanted to add a calculated field to your data? This could be something like a flag or value bracket indicator (hot, cold, just right) or even a separate calculation combining information from several existing fields. Depending on your data type and column type, you may want to use different method types to remove unwanted columns, drop rows or row names, and remove na values from your dataset. We can show you how to remove an entire column, or just part of an existing column label using a simple regular expression that does not require the dplyr package. This will work with any data type or column type, even if it is imported from a csv file.
Continuing our chicken farming data table example, lets sort our chickens into groups. We’re going to analyze the birds that were measured on the final day and sort them into groups based on weight.
# add and delete column in r examples # sort initial birds into groups by weight (before adding column) oldbirds <- ChickWeight[ChickWeight$Time==21,] oldbirds$weightclass <- ifelse(oldbirds$weight > 250, c("large"), c("small"))
This little script will create a new field called weightclass and spin through the multiple columns of our data frame, using a simple if-then conditional test to assess which rows represent “large” birds and which rows are “small” birds.
Got more than two outcomes? Here’s a way to code that version by “doing it in slices”….
# add and delete column in r examples # adding column for categorical variable oldbirds$wclass[oldbirds$weight <- 300] <- "Huge" oldbirds$wclass[oldbirds$weight > 200 & oldbirds$weight <= 300] <- "Typical" oldbirds$wclass[oldbirds$weight <= 200] <- "Small"
R extends the length of the data frame with the first assignment statement, creating a specific column titled “weightclass” and populating multiple rows which meet the condition (weight > 300) with a value or attribute of “Huge”. The remaining rows are left blank, eventually being filled with other variable names as the other statements execute.
This same logic can be applied for a mathematical calculation, where you combine the results of multiple data frame columns to create a new column. For example, lets look at the average weight per days of age for our chickens.
# add and delete columns in R examples # add column for mathematical calculation oldbirds$weightperday <- oldbirds$weight / oldbirds$time
In this last example, the formula will be evaluated and applied to each row of the data frame, creating a new column with the calculated amount.
How to remove a column in r
Supposed you want to drop columns in an R dataframe by name. You can accomplish this by the simple act of setting that specific column to NULL, as demonstrated by the drop function code below.
# how to remove a column in r / delete column in R # this version will remove column in r by name dataframe$columetoremove <- NULL
This approach will set the data frame’s internal pointer to that single column to NULL, releasing the space and will remove the required column from the R data frame. A simple but efficient way to drop data frame columns.
This is actually a very useful technique when working on project code that is potentially shared across multiple team members. It is good form to build checks and audits into your work. Sometimes you may want to incorporate additional calculations and flags into your data frame to validate data. However, over the course of a large project, these QA calculations can add significant overhead to a project (not to mention a huge mess you need to wade through in quality assurance for later steps). Inserting code to remove unwanted columns after you need them, before passing the information to the next step, makes life easier for everyone.
Another Approach: Remove Multiple Columns By Name
A twist on the prior example. If you needed to remove several duplicate columns from a data frame, consider using the following snippet.
# delete multiple columns in r # delete column in R by mapping Null value to them dataset$firstcol <- dataset$nextcol <- dataset$anothercol <- NULL
This drop function can be used for removing unwanted columns in R, especially if you need to run “drop columns” on three to five at a time. Better yet, since the underlying operation (remove column in r by name) is very transparent, it will be easy for others to understand your code. You may also want to look at changing column names to ensure the final results are easy to read.
Learning how to remove a column from a table in R can provide you with a lot of neat tricks. When you drop a column in R, it can help clear up miscellaneous data that isn’t essential to the specific statistical function you are trying to carry out, or missing values in a select column that you want to remove from your other numeric columns. If you are importing a dataset from an outside source, or even using a dataframe of data that you collected, there may be a variety of statistical tasks, functions, or graphs that you want to create with different parts of your R dataframe. Learning how to remove columns in R can assist you with that, by allowing you to focus on only a couple columns of a large dataset at one time. You may want to drop the last column number in r, or about dropping the first column value in r. These actions allow you to manipulate your data exactly how you want to, and dropping a select column from a dataframe is quick and easy!
[Reader Update! June 2018] Yet Another Way to Delete Columns in R
Apparently the if-else clause can be used more broadly that we showed in the example above, nesting the if-else operator to support multiple conditions. So we could implement the second example (three level flag) using the following code. This gives us yet another way to delete duplicate columns in R.
# how to remove a column in r / delete column in R oldbirds$weightclass <- ifelse(oldbirds$weight > 350, c("Huge"), ifelse(oldbirds$weight < 200, c("Typical"), c("small")))
Next up, how to add and remove rows from a r data frame. Or if you want to skip ahead…