R functions – is.na – cleaning up missing values

As we’ve noted elsewhere, missing values can be a significant annoyance in real world data collection. We’re going to talk about how to use is.na in R to deal with missing data in R.

Surveys come back incomplete or illegible, meter readings are indeterminate, and tick sheets are lost. The variable in question might even occur sparsely, in combination with other factors. In any event, we’re going to need to identify and clean up missing values.

is.na in R – How to Find NA values in R

The first step of the process is detecting missing values in our data when they occur. This is accomplished using the function is.na in R.

# is.na in R example
test <- c(1,2,3,NA) 
is.na(test)

This function will return a vector of True / False values indicating if the values of a vector are missing. This can be used to filter or replace values.

To select entire rows of a data frame which include at least one missing value, consider using the complete.cases function (complete cases function reference).

How to Count Missing Values in R

You can use is.na in R to count missing values in R. Use the is.na function to filter the vector of values you wish to inspect; count the items passing the filter. You’ve now counted the number of missing values in the vector.

If you want to get particularly creative, you can go up a level of abstraction and map this process across the columns of a data frame to find columns with na in r. Simply apply this column to each column – then select the columns with a non-zero result.

Naturally you can count the reverse. Insert a timely not statement (is not na in r).

Conclusion – Why Detect Missing Values in R

While we’ve literally made much ado about nothing (the peril of allowing a humanities major to comment on mathematics), missing data in R can easily mess up your analysis. Using the is na function in R gives you a way to clean up your data for a proper analysis.