Walking is a popular topic among programmers focused on data science and artificial intelligence. Not so much the physical activity, but as a theoretical topic that can be applied to computational processes. And different takes on this question can be used for different mathematical models. The traveling salesman problem is one of the most well-known examples of this idea. It posits a situation where a traveling salesman needs to traverse a series of cities in the most efficient manner. This and similar walking-related questions focus on modeling optimal decision-making and combinational optimization.

Other variants of this idea focus more on the actual physical process involved with movement. For example, the random walk problem asks us to model the path of a randomly moving entity. And while this is still a difficult problem to solve, it’s one that’s especially applicable to modeling in data-focused programming languages like R. And you’ll soon discover exactly how to do so. But we need to begin by first looking at both the problem itself and the R programming language.

## Initial Considerations and Simulated Environments

Random walk is one of those concepts which is almost deceptive in its simplicity. On the surface, we can think of it in literal terms. This would imply a mathematical model used to chart random but constrained movement on a physical plane. Imagine someone strolling along a huge sidewalk that spans multiple rows and columns. Every ten seconds he’ll randomly move based on the results of a coin flip.

Our coin-flipping traveler would essentially move between the sidewalk’s expansion joints like a piece moving around a board game. Or, in other words, he’d move between squares on the sidewalk. This type of movement would describe a random walk. But the concept can be taken far beyond that example. Random walk is applicable to any movement, literal or figurative, within designated dimensions. We could even designate square areas of three-dimensional space as points within our simulation. This could model the movement of birds who can move along a z-axis instead of just x and y.

And we can even add additional variables to our equation. A traveler’s movement might be impacted by the underlying environment. Or we could add additional agents to the concept. For example, we could predict the likelihood that two friends on a stroll might randomly bump into each other when both are working along the same random walk equation. It’s also important to keep in mind that the term walking is metaphorical. While our walk can describe a physical process, it’s essentially a time series problem that tracks data points over specific intervals. It’s as much of a raw probability model as it is a model of theoretical physical movement.

We also need to take a moment to note how R’s syntax and library impact a random walk implementation. R provides us with a wealth of options to create any sort of statistical model. As such, there’s really no one perfect way to implement a random walk process in R. But that’s also part of the fun. You can take the ideas found here and expand on them in a wide variety of different ways. But with that in mind, let’s begin with a simple implementation and move on from there.

## An Initial Implementation

We’ll begin by creating a walk solution that uses a basic loop. But before you run this code, keep in mind that R uses a pseudorandom number generation system. This means that you’ll see different results of this code every time you run it. At least if you don’t set a seed value to push the random variable generation into a set pattern. You can simply set a seed value as the first line of code in the following examples to essentially derandomize the process. For example, you might add this line.

set.seed(12345)

But with that in mind, take a look at the following code.

ourSteps <- 345

ourDrift <- 0.005

ourSets <- 1

ourStats <- numeric(ourSize)

ourStats[1] <- rnorm(ourSets,ourDrift)

for (x in 2:ourSteps) {

ourNextStep = rnorm(ourSets,ourDrift)

ourStats[x] <- ourStats[x-1] + ourNextStep

}

plot(ourStats)

We begin by assigning some variables. OurSteps is how many virtual steps we’ll simulate within the model. The ourDrift variable refers to something called drift. This is essentially a weight applied to the calculation which pushes for consistency. You can think of it as indicative of trends. Some people might have a preference for consistent choices while others don’t. Or if we’re looking at social or economic data the drift would correlate with trends. The higher our drift parameter the less variability we’ll see in our end result. A wood chip drifting with the wind on an otherwise still pond moves with a high level of randomness. But that same wood chip would move in a much more predictable pattern if it was drifting on the surface of a steadily moving river. The same is true when using our drift parameter.

The ourSets variable is essentially just a placeholder value that we’ll use to signify that we’re working with a single set of data. And speaking of data, we begin working with our in-progress values with ourStats. This is where we’ll store our steps as we calculate them.

The initial step, along with the subsequent calculations, are derived from running our variables through the rnorm function. Rnorm supplies us with random variates from a normal distribution. We supply it with two arguments. The ourSets variable is the placeholder value which signifies that we’re only working with a single set of data. And the ourDrift variable signifies how much drift we’re factoring into the calculation. Rnorm uses some other values, such as a standard deviation. But because we’re only passing these two values we’ll be using the defaults for everything else.

We proceed to loop through rnorm assignments to the ourStats container. Think of this as plotting a city map as a traveler walks along the sidewalks. Each step is a new point on the map. And, finally, we get to see that map at the end. When the loop finishes we pass ourStats to plot and the results should appear on a graph.

## Additional Considerations

It’s important to keep in mind that while this is a functional random walk implementation it’s still bests seen as a foundation for further efforts. We’ve touched on the point that random walk is widely applicable to different situations. And we can indeed apply this process to specific real-world models.

Some of these applications are obvious. For example, applying our walk to random movement of entities in physical spaces. But it could also be used to chart the movement of items in water, diffusion, etc. We can even apply this function to economics or social movements. We can further refine this by changing prediction intervals and the drift model to best mirror different elements that impact the larger random walk process. For example, change the drift to the following and you’ll get a continually changing view of how drift influences patterns.

driftModifier <- runif(1, min=0.001, max=0.1)

ourDrift <- 0.005*driftModifier

The initial drift is now influenced by an additional element that could correlate with a number of different things. Over a longer period of steps, this could be used to show the impact of, for example, wind or water currents on movement. The more you play around with this concept the more you’ll typically find those seemingly random elements actually have some logic behind them. Noise and drift might seem like chaos at first. But they’re often easily replicated or compensated for when we just scale the numbers a bit. And we can also use this model to show how even a slight influence on random processes can create orderly movement. Even a slight weight on drift can have a big effect in the long term.