R rbinom – How To Simulate Bernoulli trials in R

This article about R’s rbinom function is part of a series  about generating random numbers using R. The rbinom function can be used to simulate the outcome of Bernoulli trials. This is a fancy statistical word for flipping coins. You can use it to calculate the number of successes in a set of pass/fail trials with success estimated at probability p. Our earlier articles in this series dealt with:

R and the Binomial Distribution

We’re going to start by introducing the rbinom function and then discuss how to use it.

Many statistical processes can be modeled as independent pass / fail trials. For example, how many times will a coin will land heads in a series of coin flips. Each side has a 50/50 chance of landing facing upwards. We can  estimate of how often a standard six sided die will show a value of 5 or more. This occurs one third of the time. Or for a real world example, the odds of a batter hitting in baseball.

R’s rbinom function simulates a series of Bernoulli trials and return the results. The function takes three arguments:

  • Number of observations you want to see
  • Number of trials per observation
  • probability of success for each trial

The expected syntax is:

rbinom (# observations, # trails/observation, probability of success )

For this example, lets assume we’re in charge of quality for a factory. We make 150 widgets per day. Defective widgets must be reworked. We know that there is a 5% error rate. Lets estimate how many widgets we will need to fix each day this week.

rbinom(7, 150,.05)
[1] 10 12 10 2 5 5 14

We can model individual Bernoulli trials as well. We do this be setting the trials attribute to one. Here is the outcome of 10 coin flips:

rbinom(10, 1,.5)
[1] 1 0 1 1 1 0 0 0 0 1

Or stepping it up a bit, here’s the outcome of 10 flips of 100 coins:

rbinom(10, 100,.5)
[1] 52 55 51 50 46 42 50 49 46 56

Using rbinom & The Binomial Distribution

Binomial probability is useful in business analysis. These statistics can easily be applied to a very broad range of problems. It can also be used in situation that don’t fit the normal distribution. This is common in certain logistics problems.

A great example of this last point is modeling demand for products only sold to a few customers. The variance of demand exceeds the mean usage. This implies negative usage. This is unlikely in the real world. Most customers don’t return products. Approaching the problem as a set of Bernoulli trials works better. The combination of the trials is your forecast. This works well for products with only a handful of customers.

As sample sizes rise, the binomial distribution will start to converge on the normal distribution. This technique is most useful when working with smaller samples where there is considerable variation.