As the song says, you don’t always get what you want…. this is particularly true in data science, where what you want happens to be a full and complete data set and unfortunately a few values are missing. Since an NA value can distort your data analysis and your entire dataframe, you need to fix this. We’re going to talk about how to replace NA with 0 in R. This is one of several data manipulation methods you can use to clean up your data using R programming.
Here are a couple of options for replacing the null values in your data-frame with zero. This is one of several ways to dealing with missing data – we profile other options here (removing NA rows).
R Vectors: Replacing NA with 0
Very simple case – replacing a missing value in an R Vector:
example <- c(3,4,5,NA,7,8,9,10) example[is.na(example)] <- 0
This code will convert any NAn value in the vector or selected column to zero.
R Dataframe: Changing NA to Zeros
A similar approach works for an entire dataframe. If you’re working with an R matrix instead of an R data frame, you can easily convert it using the as.data.frame method. In this example, we’re going to randomly generate values for the data frame (no negative values in this specific column). We’re using the sample function to do this.
> ex2 <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5) > ex2 [,1] [,2] [,3] [,4] [,5] [1,] NA 1 2 2 3 [2,] 1 NA NA 1 NA [3,] 4 1 1 1 2 [4,] 2 2 2 NA 3 [5,] 5 NA 5 3 5 > testdf <- as.data.frame(ex2) < testdf[is.na(testdf)] <- 0 & > testdf V1 V2 V3 V4 V5 1 0 1 2 2 3 2 1 0 0 1 0 3 4 1 1 1 2 4 2 2 2 0 3 5 5 0 5 3 5
Even with multiple column value data points, this function still can pinpoint a specific value in a specific column in the dataset, and replace value of the missing observation with a new single value of zero. This function should work no matter what variable or data type you are using, and fill in that blank cell to make your data analysis and data manipulation easier.
These are the “cleanest” two options for handling this situation, both easily expressed using a regular expression in base R, though there may be a dplyr function that accomplishes the same thing, replacing an nan value or missing value with zero.
For more help cleaning up data, check out our functions reference.