How To Replace Values In A Data Frame in R

In this tutorial we will show you how to replace values in R. We will use base R and no additional packages or libraries are needed.

When dealing with missing values, you might want to replace values with a missing values (NA). This is useful in cases when you know the origin of the data and can be certain which values should be missing. For example, you might know that all values of “N/A”, “N A”, and “Not Available”, or -99, or -1 are supposed to be missing.

First lets see how to replace non missing values, and we will cover missing values later.

I will define a dataframe with somewhat inconvenient values and we will see how to replace them.

persons<- c(“Taylor”, “Rich”, “Thomas”, “Oliver”)

numbers <- c(“544-755”,  “122-918”, “coming soon”, “not listed”)

df <- data.frame(persons, numbers, stringsAsFactors = FALSE)

df

    persons           numbers

1  Taylor              544-755

2    Rich                122-918

3  Thomas           coming soon

4  Oliver              not listed

Now for the test, we are going to replace values ‘coming soon’ and ‘not listed’.

The first thing you should do is to verify what type of class ‘numbers’ column is:

class(df$numbers)

[1] “character”

Now, lets change “not listed” numbers to “no numer” value.

df$numbers[df$numbers  == “not listed”]  <-  “no number”

df

    persons           numbers

1  Taylor              544-755

2    Rich                122-918

3  Thomas           coming soon

4  Oliver              no number

Now, lets see how to change multiple values to one specific value. Lets change in our example dataframe values of “coming soon” and “no number” to “Need number”:

df$numbers[df$numbers  %in% c(“no number”,”coming soon”)] <- “Need number”

              df

    persons           numbers

1  Taylor              544-755

2    Rich                122-918

3  Thomas           Need number

4  Oliver              Need number

Now, that is how we replace our character values. Replacing the number values is absolutely the same:

persons<- c(“Taylor”, “Rich”, “Thomas”, “Oliver”)

numbers <- c(544755,  122918, 333444, 777333)

df <- data.frame(persons, numbers, stringsAsFactors = FALSE)

df

  persons             numbers

1  Taylor             544755

2    Rich               122918

3  Thomas           333444

4  Oliver              777333

class(df$numbers)

[1] “numeric”

To change number with value 333444:

df$numbers[df$numbers  == 333444]  <-  133111

df

  persons             numbers

1  Taylor             544755

2    Rich               122918

3  Thomas           133111

4  Oliver              777333

To change number of a person with name Taylor:

              df[df$persons  == “Taylor”, “numbers”]<- 999999

df

  persons             numbers

1  Taylor             999999

2    Rich               122918

3  Thomas           133111

4  Oliver              777333

Lets see how to replace missing data in our datasets. Missing values in R are represented by NA which means not available. Lets first see how to detect missing data. I will define a vector:

vec <- c(1,2,3,NA,5,6)

is.na(vec)

[1] FALSE FALSE FALSE  TRUE FALSE FALSE

We see that is.na() function returns a logical vector with TRUE for missing values and FALSE for non-missing values. We can go further and find indexes of a missing values in a vector:

              which(is.na(vec))

              [1] 4

Lets make a non-dummy dataframe:

              Player.Name <- c(“Andrew Wiggins”,”Jabari Parker”,”Joel Embild”,”Aaron Gordon”,

                 “Dante Exum”,”Marcus Smart”,”Julius Randle”,”Nik Stauskas”,

                 “Noah Vonelh”,”Elfrid Payton”)

Team <- c(“T-Wolves”,”Bucks”,”76-ers”,”Magic”,”Jazz”,”Celtics”,”Lakers”,”Kings”,

                    “Hornets”,”Magic”)

Status <- c(“Active”,”Injured”,”Injured”,”Active”,”Active”,”Active”,”Injured”,

           “Active”,”Active”,”Active”)

PPG <- c(15.2, NA, NA, 5.9, 4.7, 6.8, NA, 3.4, 3.0, 7.9)

Salary <- c(4636843, 8431216, 2426176, 2459100, 2542100, 7513400, 2134300,

            1234500, 5248900, 5412400)

nba.rooks <- data.frame(Player.Name, Team, Status, PPG, Salary)

nba.rooks

      Player.Name        Team            Status                 PPG                      Salary

1  Andrew Wiggins    T-Wolves    Active                 15.2                     4636843

2   Jabari Parker          Bucks         Injured                NA                       8431216

3     Joel Embild          76-ers         Injured                NA                       2426176

4    Aaron Gordon      Magic          Active                 5.9                       2459100

5      Dante Exum        Jazz             Active                  4.7                       2542100

6    Marcus Smart      Celtics          Active                 6.8                       7513400

7   Julius Randle         Lakers         Injured                NA                       2134300

8    Nik Stauskas        Kings            Active                  3.4                       1234500

9     Noah Vonelh      Hornets        Active                 3.0                       5248900

10  Elfrid Payton        Magic          Active                  7.9                       5412400

We can remove rows which contain no values in them:

              na.omit(nba.rooks)

But that may or may not be what you want.

By using which(is.na()) functions we saw earlier, we can replace missing values with any number.

Lets change missing values with 1:

              >test <- nba.rooks

>test$PPG[which(is.na(nba.rooks$PPG))] <- 1

>test

      Player.Name        Team            Status                 PPG                      Salary

1  Andrew Wiggins    T-Wolves    Active                 15.2                     4636843

2   Jabari Parker          Bucks         Injured                1                           8431216

3     Joel Embild          76-ers         Injured                1                           2426176

4    Aaron Gordon      Magic          Active                 5.9                       2459100

5      Dante Exum        Jazz             Active                  4.7                       2542100

6    Marcus Smart      Celtics          Active                 6.8                       7513400

7   Julius Randle         Lakers         Injured                1                           2134300

8    Nik Stauskas        Kings            Active                  3.4                       1234500

9     Noah Vonelh      Hornets        Active                 3.0                       5248900

10  Elfrid Payton        Magic          Active                  7.9                       5412400

Lets check another way of filling missing values. There are players which were injured and they didn’t play, therefore they didn’t score any points. What we can do is we can make projected points per game, what they would have scored if they had played.

Usualy what we do in this kind of situations, we replace NA with mean of other available values:

              >nba.rooks$PPG[which(is.na(nba.rooks$PPG))] <- mean(nba.rooks$PPG, na.rm = TRUE)

> nba.rooks

      Player.Name        Team            Status                 PPG                      Salary

1  Andrew Wiggins    T-Wolves    Active                 15.2                     4636843

2   Jabari Parker          Bucks         Injured                6.7                       8431216

3     Joel Embild          76-ers         Injured                6.7                       2426176

4    Aaron Gordon      Magic          Active                 5.9                       2459100

5      Dante Exum        Jazz             Active                  4.7                       2542100

6    Marcus Smart      Celtics          Active                 6.8                       7513400

7   Julius Randle         Lakers         Injured                6.7                       2134300

8    Nik Stauskas        Kings            Active                  3.4                       1234500

9     Noah Vonelh      Hornets        Active                 3.0                       5248900

10  Elfrid Payton        Magic          Active                  7.9                       5412400

Last way of replacing a values I am going to show you is using mutate() method from tidyverse library.

Mutate function can either create new variables or replace values from existing ones.

In the next example, we will replace values from Team column:              

library(tidyverse)

>nba.rooks <- mutate(nba.rooks, Team = “Hawks”)

> nba.rooks

      Player.Name        Team            Status                 PPG                      Salary

1  Andrew Wiggins    Hawks        Active                 15.2                     4636843

2   Jabari Parker          Hawks       Injured                6.7                       8431216

3     Joel Embild          Hawks         Injured                6.7                       2426176

4    Aaron Gordon      Hawks        Active                 5.9                       2459100

5      Dante Exum        Hawks        Active                  4.7                       2542100

6    Marcus Smart      Hawks        Active                 6.8                       7513400

7   Julius Randle         Hawks         Injured                6.7                       2134300

8    Nik Stauskas        Hawks         Active                  3.4                       1234500

9     Noah Vonelh      Hawks          Active                 3.0                       5248900

10  Elfrid Payton        Hawks         Active                  7.9                       5412400

Here we covered most reliable and most efficient ways of replacing different values.

Example with mutate() we shown is the basic method of using it, but it is not intended primarily for replacing values. If you want to learn more about mutate() and other useful methods of data transformations, check out free e book ‘R for data science’ by Garett Grolemund.