Natural Log in R – Transforming Your Data

The natural log function is frequently used to rescale data for statistical and graphical analysis. This can be accomplished in R via the use of the log() function which can be mapped across a vector or data frame. The resulting series of values will be transformed, reducing the visual distance between observations that are orders of magnitude apart (eg. 10, 100, 1000 would be adjacent to each other visually).

This transformation is particularly common in economics and certain aspects of the natural and social sciences. It can be used to help provide clearer perspective on trends where the underlying data is subject to power-law effects and the Pareto principle (80 / 20 rule, etc). Rescaling data through a natural log transformation reduces the impact a few excessively large data points have when calculating a trend-line through the sample.

Natural Log in R

To calculate the natural log in R, use the log() function. The default setting of this function is to return the natural logarithm of a value.

# natural log in r - example
> log(37)
[1] 3.610918

Log transformation

We’re going to show you how to use the natural log in r to transform data, both vectors and data frame columns.

Transforming Data Frame Columns

Log transforming your data in R for a data frame is a little trickier because getting the log requires separating the data. Taking the log of the entire dataset get you the log of each data point. However, you usually need the log from only one column of data.

# natural log in R example - data frame column
> ChickWeight$logweight=log(ChickWeight$weight)
> head(ChickWeight)
   weight Time Chick Diet logweight
 1     42    0     1    1  3.737670
 2     51    2     1    1  3.931826
 3     59    4     1    1  4.077537
 4     64    6     1    1  4.158883
 5     76    8     1    1  4.330733
 6     93   10     1    1  4.532599

As you can see the pattern for accessing the individual columns data is dataframe$column. The head() returns a specified number rows from the beginning of a dataframe and it has a default value of 6. These plot functions graph weight vs time and log weight vs time to illustrate the difference a log transformation makes.

Natural Log in R – Vectors

Doing a log transformation in R on vectors is a simple matter of adding 1 to the vector and then applying the log() function. The result is a new vector that is less skewed than the original.

# natural log in R - vector transformation 
> v = c(100,10,5,2,1,0.5,0.1,0.05,0.01,0.001,0.0001)
 > q=log(v+1)
 > q
  [1] 4.6151205168 2.3978952728 1.7917594692 1.0986122887 0.6931471806 0.4054651081
  [7] 0.0953101798 0.0487901642 0.0099503309 0.0009995003 0.0000999950

A close look at the numbers above shows that v is more skewed than q. This fact is more evident by the graphs produced from the two plot functions including this code.

Summation

While log functions themselves have numerous uses, in data science, they can be used to format the presentation of data into an understandable pattern. They are handy for reducing the skew in data so that more detail can be seen. In R, they can be applied to all sorts of data from simple numbers, vectors, and even data frames. The usefulness of the log function in R is another reason why R is an excellent tool for data science.