The dist function in R can be utilized to calculate a distance matrix, which shows the distances between different kinds of data frame or rows of a matrix(grid).

This function utilizes the following basic syntax:

dist(x, method=”euclidean”)

where:

x: The name of the grid or data frame.

method: The distance matrices measure to utilize. Standard is “euclidean” but options involve “minkowski”, “binary”, “canberra”, “manhattan”, or “maximum”.

The following examples display how to use this distance function in conjunction with the following data frame:

#define four vectors

a : c(2, 4, 4, 6)

b : c(5, 5, 7, 8)

c : c(9, 9, 9, 8)

d : c(1, 2, 3, 3)

#row bind four vectors into grid

mat : rbind(a, b, c, d)

#view matrix

mat

[,1] [,2] [,3] [,4]

a 2 4 4 6

b 5 5 7 8

c 9 9 9 8

d 1 2 3 3

Example 1: Use dist() to Calculate Euclidean Distance in R Programming

The Euclidean length between two pair of vectors, A and B, is calculated as:

Euclidean distance = √Σ(Ai-Bi)2

The following code shows how to calculate a distance grid that shows the Euclidean length between each row of a grid in R:

#calculate Euclidean length between each row in grid

dist(mat)

a b c

b 4.795832

c 10.148892 6.000000

d 3.872983 8.124038 13.190906

Euclidean distance = √Σ(Ai-Bi)2

The following code displays how to compute a distance grid that shows the Euclidean length between each row of a grid in R:

#calculate Euclidean length between each row in grid

dist(mat)

a b c

b 4.795832

c 10.148892 6.000000

d 3.872983 8.124038 13.19090

Here’s how to determine the output:

The Euclidean distance among row a and row b is 4.795832.

The Euclidean distance among row a and row c is 10.148892.

The Euclidean distance among row a and row d is 3.872983.

The Euclidean distance among row b and row c is 6.000000.

The Euclidean distance among row b and row d is 8.124038.

The Euclidean length between row c and row d is 13.190906.

Example 2: Use dist() to Compute Maximum Distance

The Maximum distance among two vectors, A and B, is computed as the maximum difference between every pairwise elements.

The following code displays how to compute a distance grid that displays the Maximum distance among each row of a grid in R:

#calculate Maximum distance among each row in grid

dist(mat, method=”maximum”)

a b c

b 3

c 7 4

d 3 5 8

Example 3: Use dist() to Calculate Canberra Distance

The Canberra distance among two vectors, A and B, is computed as:

Canberra distance = Σ |Ai-Bi| / |Ai| + |Bi|

The following code displays how to calculate a distance matrix that shows the Canberra distance among each row of a grid in R:

#calculate Canberra distance among each row in grid

dist(mat, method=”canberra”)

a b c

b 0.9552670

c 1.5484515 0.6964286

d 1.1428571 1.9497835 2.3909091

Example 4: Use dist() to Calculate Binary Distance

The Binary distance among two vectors, A and B, is computed as the proportion of elements that the two vectors have.

The following code displays how to calculate a distance matrix that shows the Binary distance among each row of a grid in R:

#calculate Binary distance among each row in grid

dist(mat, method=”binary”)

a b c

b 0

c 0 0

d 0 0 0

Example 5: Use dist() to Calculate Minkowski Distance

The Minkowski distance among two vectors, A and B, is calculated as:

Minkowski distance = (Σ|ai – bi|p)1/p

where i is the ith element in every vector and p is an integer.

The following code displays how to calculate a distance matrix that shows the Minkowski distance (using p=3) among each row of a grid in R:

#compute Minkowski distance among each row in grid

dist(mat, method=”minkowski”, p=3)

a b c

b 3.979057

c 8.439010 5.142563

d 3.332222 6.542133 10.614765

Manhattan distance

Definition: The length among two points measured through axes by right angles. In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 – x2| + |y1 – y2|.

Distance Matrix Computation

Description

This data science computes and reveals the length matrix computed by utilizing the specified distance measure to compute the distances among the rows of a data matrix.

Usage

dist(x, method = “euclidean”, diag = FALSE, upper = FALSE, p = 2)

as.dist(m, diag = FALSE, upper = FALSE)

## Default S3 method:

as.dist(m, diag = FALSE, upper = FALSE)

## S3 method for class ‘dist’

print(x, diag = NULL, upper = NULL,

digits = getOption(“digits”), justify = “none”,

right = TRUE, …)

## S3 method for class ‘dist’

as.matrix(x, …)

Arguments

x

a numeric matrix, data frame or “dist” object.

method

the length measure to be utilized. This must be one of “euclidean”, “maximum”, “manhattan”, “canberra”, “binary” or “minkowski”. Every unambiguous substring can be provided.

diag

logical value identifying whether the diagonal of the distance matrix could be printed by print.dist.

upper

logical value identifying whether the higher triangle of the length grid should be printed by print.dist.

p

The strength of the Minkowski distance.

m

An object with distance information to be converted to a “dist” object. For the standard method, a “dist” object, or a grid (of distances) or an object that can be coerced to such a matrix utilizing as.matrix(). (Only the bottom triangle of the matrix is utilized, the rest is forgotten).