This article about R’s rbinom function is part of a series about generating random numbers using R. The rbinom function can be used to simulate the outcome of Bernoulli trials. This is a fancy statistical word for flipping coins. You can use it to calculate the number of successes in a set of pass/fail trials with success estimated at probability p. Our earlier articles in this series dealt with:
- random selections from lists of discrete values
- Simulating the uniform distributions
- Simulating a normal distribution
R and the Binomial Distribution
We’re going to start by introducing the rbinom function and then discuss how to use it.
Many statistical processes can be modeled as independent pass / fail trials. For example, how many times will a coin will land heads in a series of coin flips. Each side has a 50/50 chance of landing facing upwards. We can estimate of how often a standard six sided die will show a value of 5 or more. This occurs one third of the time. Or for a real world example, the odds of a batter hitting in baseball.
R’s rbinom function simulates a series of Bernoulli trials and return the results. The function takes three arguments:
- Number of observations you want to see
- Number of trials per observation
- probability of success for each trial
The expected syntax is:
rbinom (# observations, # trails/observation, probability of success )
For this example, lets assume we’re in charge of quality for a factory. We make 150 widgets per day. Defective widgets must be reworked. We know that there is a 5% error rate. Lets estimate how many widgets we will need to fix each day this week.
# r binomial - binomial simulation in r rbinom(7, 150,.05)  10 12 10 2 5 5 14
We can model individual Bernoulli trials as well. We do this be setting the trials attribute to one. Here is the outcome of 10 coin flips:
# bernoulli distribution in r rbinom(10, 1,.5)  1 0 1 1 1 0 0 0 0 1
Or stepping it up a bit, here’s the outcome of 10 flips of 100 coins:
# binomial simulation in r rbinom(10, 100,.5)  52 55 51 50 46 42 50 49 46 56
Using rbinom & The Binomial Distribution
Binomial probability is useful in business analysis. These statistics can easily be applied to a very broad range of problems. It can also be used in situation that don’t fit the normal distribution. This is common in certain logistics problems.
A great example of this last point is modeling demand for products only sold to a few customers. The variance of demand exceeds the mean usage. This implies negative usage. This is unlikely in the real world. Most customers don’t return products. Approaching the problem as a set of Bernoulli trials works better. The combination of the trials is your forecast. This works well for products with only a handful of customers.
As sample sizes rise, the binomial distribution will start to converge on the normal distribution. This technique is most useful when working with smaller samples where there is considerable variation.
Related functions: pbinom, qbinom, dbinom
Need a standard probability density function for the binomial distribution?
Example: If we flip a fair coin 10 times, what is the probability of getting exactly 5 heads?
You should use R’s dbinom function. You can use this to calculate the probability of getting X successes on n binomial trials. For example, if we have a fair coin (p(head)=.5), then we can use the dbinom function to calculate the probability of getting 5 heads in 10 trials.
# dbinom r - calculate binomial probability in r dbinom(5, size=10, prob=0.5)  0.2460938
The example above indicates the probability of getting 5 heads in 10 coin flips is just under 25%.
What if we want to look at the cumulative probability of getting X successes?
Example: If we flip a fair coin 10 times, what is the probability of getting 5 or less heads?
Take a look at the R’s pbinom function, which gives the cumulative probability of an event. This is a digital version of the table of probabilities included as an appendix in your favorite statistics book.
# pbinom in r - binomial probability r pbinom(5,10,0.5)  0.6230469
What’s the difference between pbinom and dbinom?
Dbinom provides the probability of getting a result for that specific point on the binomial distribution. Pbinom calculates the cumulative probability of getting a result equal to or below that point on the distribution. In the coin example: dbinom is the probability of getting 5 heads; pbinom calculates the probability of getting 5 or less heads.
Need to set a cutoff score for a given point in the binomial distribution?
Take a look at R’s qbinom function, which calculates the inverse binomial distribution. This is the inverse of the operation performed by pbinom. You provide the function with the specific percentile within the cumulative distribution function you want to be at or below and it will generate the number of successes associated with that cumulative probability.
# r qbinom - inverse binomial distribution qbinom(0.25,10,.5)  4
Taken as a group, you can use these functions to generate the binomial distribution in R.
This is part of our series on sampling in R. To hop ahead, select one of the following links: