How to remove infinite values in R (from a vector, matrix, or data frame)

Sometimes when doing data science, your data set may contain infinite values. It is often better to substitute these infinite values with missing values or some other substitute. In R programming you can use the is.infinite function to detect infinite values. This allows you to swap the infinite values for something more usable.

Description

The is.infinite function has the format of is.infinite(X) where X is the variable you are searching for infinite values. This function produces a Boolean vector of the same length as the data being analyzed, and the location of all infinite values has a value of true. When placed within the location brackets of a vector or matrix, it will equate all true locations to whatever value you set. This process provides a simple method for detecting and replacing any infinite values in the data set you are working with. Now if you are working with a data frame, this method only works by processing each column separately.

Explanation

The is.infinite function works in a manner similar to going through a vector with a for-loop and checking each location individually. However, it is only a single command that accesses the entire vector. If you equate a vector to a specific value with the is.infinite function as the location, every location that comes up true will be set to the new value. It is a simple process that is unlikely to produce an error message because it only has a single argument, and it is the variable name that you are working with. As long as the variable names match you will get the results that you are looking for.

Examples

Here we have three examples of converting infinite values to missing values. Each example uses a different type of data structure. The first example uses a vector, the second example uses a matrix, and the third uses a data frame.

> t = as.numeric(Sys.time())
> set.seed(t)
> x = as.integer(abs(rnorm(10)*10))
> y = as.integer(abs(rnorm(3)*5))
> x[y] = Inf
> x
[1] 20 0 Inf 18 Inf 1 Inf 4 2 7
> x[is.infinite(x)] = NA
> x
[1] 20 0 NA 18 NA 1 NA 4 2 7

This example removes the infinite values from a vector and replaces them with missing values. It uses the is.infinite function to detect infinite values. It then equates those locations in the vector to missing values.

> t = as.numeric(Sys.time())
> set.seed(t)
> v = as.integer(abs(rnorm(16)*10))
> y = as.integer(abs(rnorm(4)*5))
> v[y] = Inf
> m = matrix(v, nrow = 4)
> m
[,1] [,2] [,3] [,4]
[1,] 13 Inf 9 Inf
[2,] 7 14 11 1
[3,] Inf 7 2 1
[4,] 8 Inf 8 0
> m[is.infinite(m)] = NA
> m
[,1] [,2] [,3] [,4]
[1,] 13 NA 9 NA
[2,] 7 14 11 1
[3,] NA 7 2 1
[4,] 8 NA 8 0

This example removes the infinite values from a matrix. With a matrix, this process works exactly the same way as it does with a vector. Once again it uses the is.infinite function to detect infinite values so that they can be replaced with missing values.

> t = as.numeric(Sys.time())
> set.seed(t)
> X = as.integer(abs(rnorm(5)*10))
> y = as.integer(abs(rnorm(2)*3))
> X[y] = Inf
> Y = as.integer(abs(rnorm(5)*10))
> y = as.integer(abs(rnorm(2)*3))
> Y[y] = Inf
> Z = as.integer(abs(rnorm(5)*10))
> y = as.integer(abs(rnorm(2)*3))
> Z[y] = Inf
> df = data.frame(X, Y, Z)
> df
X Y Z
1 18 Inf Inf
2 8 Inf 2
3 Inf 3 Inf
4 12 4 5
5 20 2 2
> df$X[is.infinite(df$X)] = NA
> df$Y[is.infinite(df$Y)] = NA
> df$Z[is.infinite(df$Z)] = NA
> df
X Y Z
1 18 NA NA
2 8 NA 2
3 NA 3 NA
4 12 4 5
5 20 2 2

In this final example, we remove the infinite values from a data frame. in this example we replace the infinite values one column at a time using the is.infinite function. While you can use the do.call function in cases where you have no knowledge of the column names, it is the more complicated of the two options.

Application

The main application of being able to remove infinite values is that some functions are not capable of processing them. When you are trying to put data through a function, you will get an error message if that function could not handle that type of data. Consequently, one reason for removing infinite values is to prevent error messages. There are also situations where having infinite values in your data set may cause erroneous results. A good example of this is a case where you are trying to take an average because technically even one infinite value would send an average to infinity.

Removing infinities from a data set is fairly simple because R programming has a function that is capable of detecting them. The is.infinite function is the key to being able to remove infinite values from your data. This makes it a handy tool to have within your programming toolbox.