We are always looking for differences, so it should be no surprise but in data science, we need to determine the differences between data. Thus, any programming language designed with data science in mind needs to have ways to find differences between data. R makes this easy.
Differences between elements of a vector
When dealing with differences between the elements of a vector, you may need the difference between any two elements in the vector. Under one circumstance, you may need the difference between elements right next to each other and under another, they may be separated by two or three elements. The problem gets even worse if you need to find the differences between the differences. One solution and the solution that you need to use for most programming languages is to write code for that specific situation. The solution in R of diff() function handles these calculations with ease.
R diff function
The R diff function has the format of diff(vector, lag, #differences). The vector is the list of values the diff() function is being operated on. The lag is the spacing between the numbers being subtracted. For example, a lag of 1 means that the values of right next to each other and a lag of 2 means that there is a value between them. The #differences is the layering between the differences. This means that a #differences of 1 returns the difference between the values in the vector, while a #differences of 2 returns the differences between the differences between the values in the vector.
# diff in r examples > x=c(1,2,3,5,8,13,21) > diff(x)  1 1 2 3 5 8 > diff(x,1)  1 1 2 3 5 8 > diff(x,1,1)  1 1 2 3 5 8
This illustrates that the R diff function defaults to 1 on the second and third parameter when they are omitted. It also illustrates that they produce the same results. Note the pattern of 1=2-1, 1=3-2, 2=5-2, 3=8-5, 5=13-8 and 8=21-13.
# diff in R - higher order differences; default > diff(x,2)  2 3 5 8 13 > diff(x,2,1)  2 3 5 8 13
This further illustrates diff in R defaulting to 1 on the third parameter when it is omitted. It also illustrates that they produce the same results. Note the pattern of 2=3-1, 3=5-2, 5=8-3, 8=13-5 and 13=21-8.
# diff in R - higher order differences > diff(x,1,2)  0 1 1 2 3 > diff(x,2,2)  3 5 8
Here, we have an illustration of diff in R with 2 as the third parameter. With a lag of 1, we get 0=1-1, 1=2-1, 1=3-2, 2=5-3 and 3=8-5. With a lag of 2, we get 3=5-2, 5=8-3 and 8=13-5. This is a straightforward illustration not only of the use of the diff() function but of the math going on behind the function.
# diff in R - higher order differences > diff(x,1,3)  1 0 1 1 > diff(x,2,3)  5 > diff(x,1,4)  -1 1 0 > diff(x,1,5)  2 -1 > diff(x,1,6)  -3
To further help your understanding of diff in R look over at these examples of code that are an extension the ones above, and see if you can figure out the math behind these results.
Finding the differences between elements in a vector can help you find statistical relationships between the data. Data that appears to have no clear pattern may have one in their differences or even the differences of their differences. This is why this type of analysis tool is so important and why the diff() function makes R such a useful language for data science.