Data Cleanup in R: Replacing NA values with 0

As the song says, you don’t always get what you want…. this is particularly true in data science, where what you want happens to be a full and complete data set and unfortunately a few values are missing. Since this can distort your calculations, you need to fix this. We’re going to talk about how to replace missing or NA values with zeroes. This is one of several ways to clean up your data using R.

Here are a couple of options for replacing the NA values in your data-frame with zeros. This is one of several ways to dealing with missing data – we profile other options hereĀ (removing NA rows).

R Vectors: Replacing NA with 0

Very simple case – replacing NA values in an R Vector:

 
example <- c(3,4,5,NA,7,8,9,10)
example[is.na(example)] <- 0

This code will convert NA values in the vector to zeroes.

R Dataframe: Changing NA to Zeros

A similar approach works for a data frame. If you’re working with an R matrix instead of an R data frame, you can easily convert it using the as.data.frame method. In this example, we’re going to randomly generate values for the data frame. We’re using the sample function to do this.

 
> ex2 <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5) > ex2
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA    1    2    2    3
[2,]    1   NA   NA    1   NA
[3,]    4    1    1    1    2
[4,]    2    2    2   NA    3
[5,]    5   NA    5    3    5
> testdf <- as.data.frame(ex2) > testdf[is.na(testdf)] <- 0 > testdf
  V1 V2 V3 V4 V5
1  0  1  2  2  3
2  1  0  0  1  0
3  4  1  1  1  2
4  2  2  2  0  3
5  5  0  5  3  5

These are the “cleanest” two options for handling this situation, both easily expressed using base R operations.

For more help cleaning up data, check out our functions reference.