In this tutorial we will show you how to replace values in R. We will use base R and no additional packages or libraries are needed.
When dealing with missing values, you might want to replace values with a missing values (NA). This is useful in cases when you know the origin of the data and can be certain which values should be missing. For example, you might know that all values of “N/A”, “N A”, and “Not Available”, or -99, or -1 are supposed to be missing.
First lets see how to replace non missing values, and we will cover missing values later.
I will define a dataframe with somewhat inconvenient values and we will see how to replace them.
persons<- c(“Taylor”, “Rich”, “Thomas”, “Oliver”)
numbers <- c(“544-755”, “122-918”, “coming soon”, “not listed”)
df <- data.frame(persons, numbers, stringsAsFactors = FALSE)
df
persons numbers
1 Taylor 544-755
2 Rich 122-918
3 Thomas coming soon
4 Oliver not listed
Now for the test, we are going to replace values ‘coming soon’ and ‘not listed’.
The first thing you should do is to verify what type of class ‘numbers’ column is:
class(df$numbers)
[1] “character”
Now, lets change “not listed” numbers to “no numer” value.
df$numbers[df$numbers == “not listed”] <- “no number”
df
persons numbers
1 Taylor 544-755
2 Rich 122-918
3 Thomas coming soon
4 Oliver no number
Now, lets see how to change multiple values to one specific value. Lets change in our example dataframe values of “coming soon” and “no number” to “Need number”:
df$numbers[df$numbers %in% c(“no number”,”coming soon”)] <- “Need number”
df
persons numbers
1 Taylor 544-755
2 Rich 122-918
3 Thomas Need number
4 Oliver Need number
Now, that is how we replace our character values. Replacing the number values is absolutely the same:
persons<- c(“Taylor”, “Rich”, “Thomas”, “Oliver”)
numbers <- c(544755, 122918, 333444, 777333)
df <- data.frame(persons, numbers, stringsAsFactors = FALSE)
df
persons numbers
1 Taylor 544755
2 Rich 122918
3 Thomas 333444
4 Oliver 777333
class(df$numbers)
[1] “numeric”
To change number with value 333444:
df$numbers[df$numbers == 333444] <- 133111
df
persons numbers
1 Taylor 544755
2 Rich 122918
3 Thomas 133111
4 Oliver 777333
To change number of a person with name Taylor:
df[df$persons == “Taylor”, “numbers”]<- 999999
df
persons numbers
1 Taylor 999999
2 Rich 122918
3 Thomas 133111
4 Oliver 777333
Lets see how to replace missing data in our datasets. Missing values in R are represented by NA which means not available. Lets first see how to detect missing data. I will define a vector:
vec <- c(1,2,3,NA,5,6)
is.na(vec)
[1] FALSE FALSE FALSE TRUE FALSE FALSE
We see that is.na() function returns a logical vector with TRUE for missing values and FALSE for non-missing values. We can go further and find indexes of a missing values in a vector:
which(is.na(vec))
[1] 4
Lets make a non-dummy dataframe:
Player.Name <- c(“Andrew Wiggins”,”Jabari Parker”,”Joel Embild”,”Aaron Gordon”,
“Dante Exum”,”Marcus Smart”,”Julius Randle”,”Nik Stauskas”,
“Noah Vonelh”,”Elfrid Payton”)
Team <- c(“T-Wolves”,”Bucks”,”76-ers”,”Magic”,”Jazz”,”Celtics”,”Lakers”,”Kings”,
“Hornets”,”Magic”)
Status <- c(“Active”,”Injured”,”Injured”,”Active”,”Active”,”Active”,”Injured”,
“Active”,”Active”,”Active”)
PPG <- c(15.2, NA, NA, 5.9, 4.7, 6.8, NA, 3.4, 3.0, 7.9)
Salary <- c(4636843, 8431216, 2426176, 2459100, 2542100, 7513400, 2134300,
1234500, 5248900, 5412400)
nba.rooks <- data.frame(Player.Name, Team, Status, PPG, Salary)
nba.rooks
Player.Name Team Status PPG Salary
1 Andrew Wiggins T-Wolves Active 15.2 4636843
2 Jabari Parker Bucks Injured NA 8431216
3 Joel Embild 76-ers Injured NA 2426176
4 Aaron Gordon Magic Active 5.9 2459100
5 Dante Exum Jazz Active 4.7 2542100
6 Marcus Smart Celtics Active 6.8 7513400
7 Julius Randle Lakers Injured NA 2134300
8 Nik Stauskas Kings Active 3.4 1234500
9 Noah Vonelh Hornets Active 3.0 5248900
10 Elfrid Payton Magic Active 7.9 5412400
We can remove rows which contain no values in them (via na.omit):
na.omit(nba.rooks)
But that may or may not be what you want.
By using which(is.na()) functions we saw earlier, we can replace missing values with any number.
Lets change missing values with 1:
>test <- nba.rooks
>test$PPG[which(is.na(nba.rooks$PPG))] <- 1
>test
Player.Name Team Status PPG Salary
1 Andrew Wiggins T-Wolves Active 15.2 4636843
2 Jabari Parker Bucks Injured 1 8431216
3 Joel Embild 76-ers Injured 1 2426176
4 Aaron Gordon Magic Active 5.9 2459100
5 Dante Exum Jazz Active 4.7 2542100
6 Marcus Smart Celtics Active 6.8 7513400
7 Julius Randle Lakers Injured 1 2134300
8 Nik Stauskas Kings Active 3.4 1234500
9 Noah Vonelh Hornets Active 3.0 5248900
10 Elfrid Payton Magic Active 7.9 5412400
Lets check another way of filling missing values. There are players which were injured and they didn’t play, therefore they didn’t score any points. What we can do is we can make projected points per game, what they would have scored if they had played.
Usualy what we do in this kind of situations, we replace NA with mean of other available values:
>nba.rooks$PPG[which(is.na(nba.rooks$PPG))] <- mean(nba.rooks$PPG, na.rm = TRUE)
> nba.rooks
Player.Name Team Status PPG Salary
1 Andrew Wiggins T-Wolves Active 15.2 4636843
2 Jabari Parker Bucks Injured 6.7 8431216
3 Joel Embild 76-ers Injured 6.7 2426176
4 Aaron Gordon Magic Active 5.9 2459100
5 Dante Exum Jazz Active 4.7 2542100
6 Marcus Smart Celtics Active 6.8 7513400
7 Julius Randle Lakers Injured 6.7 2134300
8 Nik Stauskas Kings Active 3.4 1234500
9 Noah Vonelh Hornets Active 3.0 5248900
10 Elfrid Payton Magic Active 7.9 5412400
Last way of replacing a values I am going to show you is using mutate() method from tidyverse library.
Mutate function can either create new variables or replace values from existing ones.
In the next example, we will replace values from Team column:
library(tidyverse)
>nba.rooks <- mutate(nba.rooks, Team = “Hawks”)
> nba.rooks
Player.Name Team Status PPG Salary
1 Andrew Wiggins Hawks Active 15.2 4636843
2 Jabari Parker Hawks Injured 6.7 8431216
3 Joel Embild Hawks Injured 6.7 2426176
4 Aaron Gordon Hawks Active 5.9 2459100
5 Dante Exum Hawks Active 4.7 2542100
6 Marcus Smart Hawks Active 6.8 7513400
7 Julius Randle Hawks Injured 6.7 2134300
8 Nik Stauskas Hawks Active 3.4 1234500
9 Noah Vonelh Hawks Active 3.0 5248900
10 Elfrid Payton Hawks Active 7.9 5412400
Here we covered most reliable and most efficient ways of replacing different values.
Example with mutate() we shown is the basic method of using it, but it is not intended primarily for replacing values. If you want to learn more about mutate() and other useful methods of data transformations, check out free e book ‘R for data science’ by Garett Grolemund.