One of the unfortunate aspects of data science is that real-world data is not always as clean as we would like it to be. Because of this, there are often cases where bits of data are missing in data sets. Thus, any program designed for data science needs to handle this eventuality.
Missing values in r
Handling missing values in R is quite easy. The first tool that R has for dealing with missing values is the NA value to fill in space. The functions for handling dataframes have a built-in parameter, the logical parameter na.rm, for handling this by simply skipping the NA value in the calculation. R also has two functions for handling the NA value. One, the na.omit() function simply removes the rows of data containing the NA value. The other is the is.na() function returns the value of true for each data point that is NA.
Find missing values in R
To find missing values you check for NA in R using the is.na() function. This function returns a value of true and false for each value in a data set. If the value is NA the is.na() function return the value of true, otherwise, return to a value of false. This provides for a quick and simple way of checking for NA values that can be used for other functions. Now it is possible to find NA values by running the code to check each value, but unless you have a special need for this is.na() function will do the job.
Using is.na R to check for NA in R is quite simple. The is.na() function has the form of is.na(dataset), and it returns true data point with an NA value pause for all others.
# is.na in r example > demo = c(1, 2, NA, 4, NA, 6, 7) > is.na(demo)  FALSE FALSE TRUE FALSE TRUE FALSE FALSE > demo2 = c(1, 2, 3, 4, 5, 6, 7) > is.na(demo2)  FALSE FALSE FALSE FALSE FALSE FALSE FALSE
This shows the output for the is.na() function both demo in demo2.
# is.na in r - using any to test a vector > any(is.na(demo))  TRUE > any(is.na(demo2))  FALSE
Here are the results of putting them through the any() function which shows whether any of the values are NA.
> which(is.na(demo))  3 5 > which(is.na(demo2)) integer(0)
Here are the results of putting them through the which() function which returns to locations that have NA values. In the case of demo, it is 3 and 5.
When combined with the any() and which() functions, the is.na() function is a powerful tool for dealing with missing data in a data set. For example, by finding the location of each data point with an NA value, you can change it.
Dealing with missing data is an important part of data science. You are not always going to have nice neat data sets where every bit of data was acquired. When this happens, you need the tools to handle it and R provides those tools and more.