The Case for Complete Cases
A fair amount of my career has focused on quality engineering and process improvement. We tend to look at a lot of tick sheets and similar ad-hoc data collection efforts in that line of work. And it is not uncommon that a few values are…. missing. Since many statistical procedures are dependent on a complete and or balanced data set, you’ve got a decision to make about fixing or dropping records with missing values.
One option is simply setting missing values to zero. While this is valid approach for certain studies, it can create additional problems. For example, there is often signal value in the missing data. A sloppy process operator will generally do a poor job on collecting quality samples. The converse is often true – rigorous records can indicate high attention to detail by the operating team. We often want to split the data into records with complete cases (all values) and missing values (the converse of complete cases for R).
We can accomplish this using the complete.cases() function.
complete.cases in R – Get Vector of Case Rows With na Values
Missing or na values can cause a whole world of trouble, messing up anything you might do with your data. Complete.cases in r will help change that.
The complete cases function will examine a data frame, find complete cases, and return a logical vector of the rows which contain missing values. or incomplete cases. We can examine the dropped records and purge them if we wish.
complete_records <- sampledata[!complete.cases(sampledata)] partial_records <- sampledata[complete.cases(sampledata)]
This technique allows us to look at and exclude na data using the na.omit df function, or find an alternate way of dealing with the missing values. Using complete.cases in R, we can clean up our data, and make it easier to carry out statistical functions like finding the standard deviation or creating a confidence interval. Finding complete cases is a breeze, and yet another invaluable skill for any good programmer.
Need more tips on cleaning up and manipulating data? Check out our tips page.