While technically this is a warning message and not an actual R error, you should pay attention here: this warning can easily affect the accuracy of your results! This usually shows up when you are working with the mean () function, especially in version 3.0 of R or later.

Let’s start with why you see this error: you fed the mean function a data frame column of something that isn’t a numeric value or a logical value. Given that mean function’s purpose in life is to add a vector of numerical values up and divide them by the length of the vector, this warning is indicating your results will be wrong.

## How This Error Occurs

This problem usually occurs when using the mean() function, due to changes in R 3.0. Generally speaking, you have fed it a vector of either the character string data type (usually badly unconverted data) or missing value(s) [na value]. Since this isn’t a numeric column, there’s a penalty flag on the play….

This function takes the mean value, aka the average, of the values in a vector or data frame column used in the function.

> a = c(1,2,3,4,5)

> b = c(TRUE,FALSE,TRUE,TRUE,FALSE)

> c = c(“a”,”b”,”c”,”d”,”e”)

> mean(c)

[1] NA

Warning message:

In mean.default(c) : argument is not numeric or logical: returning NA

This example produces a warning message because the vector “c” contains characters and not numeric or logical values. This is why this message is simple to understand. It specifically indicates that the argument needs to be either a numeric or logical value.

## What is causing this problem?

This problem results from entering neither a numeric nor logical argument into the mean() function. In the example above, it is a vector of characters however it can happen anytime a vector contains a value that is neither numeric or logical.

> a = c(1,2,3,4,5)

> b = c(TRUE,FALSE,TRUE,TRUE,FALSE)

> c = c(“a”,”b”,”c”,”d”,”e”)

> mean(a)

[1] 3

In this example, the vector “a” contains numeric values. The mean value here is 3 because the values in “a” are 1-5. Because these values are numeric, there is no message.

> a = c(1,2,3,4,5)

> b = c(TRUE,FALSE,TRUE,TRUE,FALSE)

> c = c(“a”,”b”,”c”,”d”,”e”)

> mean(b)

[1] 0.6

Here “b” which contains logical values of “TRUE” and “FALSE” resulting in an acceptable argument. You get a mean of 0.6, this is because the mean() function sees “TRUE” and “FALSE” as numeric values of 1 and 0 respectively.

## How to fix this error.

If you have complete control over the data, one solution is to make sure that the vector you are using in the mean() function contains only a numeric variable or logical values. In such cases, this may mean manually removing bad values and filtering missing data. Alternatively, you can use the apply function to focus on a specific column.

Otherwise, you will want to use some form a filter to find and eliminate any unwanted values. The point, in either case, is to eliminate any values that you do not want to apply to the mean() function. The key to this problem is to ensure that you are only using numeric or logical values in the mean() function. You can also use the na.rm parameter to filter missing value observations from a specific mean value calculation.

The third thing to check… just to avoid any embarrassment is your column names. A wrong column name due to a misspelling can trigger off a whole series of error checking efforts that are easily prevented…. Make sure you are inspecting the right column…