Many statistical and business analysis projects will require you to select a sample from a list of values. This is particularly true for simulation requests. To select a sample, r has the sample() function. This function can be used tfor combinatoric problems and statistical simulation.
Tempers flare a bit when you talk about random samples in certain audiences. This article is going to focus on the essence of using sample () to select values from a list. We are also going to briefly discuss more advanced options for sampling and random number generation.
R Sample() – Random Selections From A List
R has a convenient function for handling sample selection; sample(). This function addresses the common cases:
- Picking from a finite set of values (sampling without replacement)
- Sampling with replacement
- Using all values (reordering) or a subset (select a list)
The default setting for this function is it will randomly sort the values on a list. These are returned to the user in random order. Sample code is below:
sample (vector_of_values) sample (c(1:10))
This request returns the following:
 7 8 2 9 1 4 6 3 10 5
As you can see, we’ve shuffled the list of the first 10 numbers into a different order.
But what if a value can be selected multiple times? This is known as sampling with replacement. Sample supports this via an additional parameter: replace. Replace can be T (true) or F (false). The default case assumes no replacement. Code example looks like:
sample (c(1:10), replace =T)
Yielding the following result. As you can see, certain values are repeatedly picked.
 4 7 10 9 4 6 6 4 3 4
We can add the size parameter to return only a few values. The following code will pick three values.
sample (c(1:10), size=3)
Yielding the following result.
 3 6 8
The same result with replacement turned on…. (carefully selected)
sample (c(1:10), size=3, replace=T)  9 9 1
It took a couple of trials to get that random selection.
As a practical use case, we can use this to figure out who will pick up the bar tab for a R meetup.
sample (c('Joe','Karl','Jack','Larry','Curly', 'Moe','Kim','Kathy','Sam','Jim'), size=1)  "Kim"
Drinks are apparently on Kim this week.
The prior examples assume we are selecting values at random from a list. But R sample also allows us to adjust the probability of each item being selected. We do this with the prob argument.
Our next example imagines us on a factory floor. We make widgets, which have a certain chance of being defective. Our quality isn’t great, so there is a 25% chance of a widget being defective. We can simulate this using the following code.
sample (c('Good','Bad'), size=6, replace=T, prob=c(.75,.25))  "Bad" "Good" "Bad" "Good" "Good" "Bad"
As you can see, we stumbled upon a particularly bad sample, with even more errors than expected. We would typically expect to find 1 – 2 defects out of 6 trials, if our average defect rate is 25%. Instead, we find three defects. A 50% error rate. Indeed, our client should hire a quality consultant, ideally a consultant who knows R…..
Generating Random Numbers in R
Our examples up to this point have dealt with random selections from finite sets. But what if we need to generate a true random number using R?
The next part of our tutorial will address generating floating point numbers and values from a specific statistical distribution.