Selecting Random Samples in R: Sample() Function

Many statistical and business analysis projects will require you to select a sample from a list of values. This is particularly true for simulation requests. To select a sample, r has the sample() function. This function can be used tfor combinatoric problems and statistical simulation.

Tempers flare a bit when you talk about random samples in certain audiences. This article is going to focus on the essence of using sample () to select values from a list. We are also going to briefly discuss more advanced options for sampling and random number generation.

R Sample() – Random Selections From A List

R has a convenient function for handling sample selection; sample(). This function addresses the common cases:

  • Picking from a finite set of values (sampling without replacement)
  • Sampling with replacement
  • Using all values (reordering) or a subset (select a list)

The default setting for this function is it will randomly sort the values on a list. These are returned to the user in random order. Sample code is below:

sample (vector_of_values)
sample (c(1:10))

This request returns the following:

 [1] 7 8 2 9 1 4 6 3 10 5 

As you can see, we’ve shuffled the list of the first 10 numbers into a different order.

But what if a value can be selected multiple times? This is known as sampling with replacement. Sample supports this via an additional parameter: replace. Replace can be T (true) or F (false). The default case assumes no replacement. Code example looks like:

 sample (c(1:10), replace =T) 

Yielding the following result. As you can see, certain values are repeatedly picked.

 [1] 4 7 10 9 4 6 6 4 3 4 

We can add the size parameter to return only a few values. The following code will pick three values.

sample (c(1:10), size=3)

Yielding the following result.

[1] 3 6 8

The same result with replacement turned on…. (carefully selected)

sample (c(1:10), size=3, replace=T) 
[1] 9 9 1

It took a couple of trials to get that random selection.

As a practical use case, we can use this to figure out who will pick up the bar tab for a R meetup.

sample (c('Joe','Karl','Jack','Larry','Curly',
             'Moe','Kim','Kathy','Sam','Jim'), size=1) 
[1] "Kim"

Drinks are apparently on Kim this week.

Adjusting Probabilities

The prior examples assume we are selecting values at random from a list. But R sample also allows us to adjust the probability of each item being selected. We do this with the prob argument.

Our next example imagines us on a factory floor. We make widgets, which have a certain chance of being defective. Our quality isn’t great, so there is a 25% chance of a widget being defective. We can simulate this using the following code.

sample (c('Good','Bad'), size=6, replace=T, prob=c(.75,.25)) 
[1] "Bad"  "Good" "Bad"  "Good" "Good" "Bad" 

As you can see, we stumbled upon a particularly bad sample, with even more errors than expected. We would typically expect to find 1 – 2 defects out of 6 trials, if our average defect rate is 25%. Instead, we find three defects. A 50% error rate. Indeed, our client should hire a quality consultant,  ideally a consultant who knows R…..

Generating Random Numbers in R

Our examples up to this point have dealt with random selections from finite sets. But what if we need to generate a true random number using R?

The next part of our tutorial will address generating floating point numbers and values from a specific statistical distribution.