Читать книгу Probability - Robert P. Dobrow - Страница 39
R: SIMULATING THE PROBABILITY OF THREE HEADS IN THREE COIN TOSSES
Оглавление# CoinFlip.R # Trial > trial <- sample(0:1, 3, replace = TRUE) # Success > if (sum(trial) == 3) 1 else 0 # Replication > n <- 10000 # Number of repetitions > simlist <- numeric(n) # Initialize vector > for (i in 1:n) { trial <- sample(0:1, 3, replace = TRUE) success <- if (sum(trial) == 3) 1 else 0 simlist[i] <- success } > mean(simlist) # Proportion of trials with 3 heads [1] 0.1293
The script is divided into three parts to illustrate (i) coding the trial, (ii) determining success, and (iii) implementing the replication.
To simulate three coin flips, use the sample
command. Again letting 1 represent heads and 0 represent tails, the command
> trial <- sample(0:1, 3, replace = TRUE)
chooses a head or tails three times. The three results are stored as a three-element list (called a vector in R) in the variable trial
.
After flipping three coins, the routine must decide whether or not they are all heads. This is done by summing the outcomes. The sum will equal three if and only if all flips are heads. This is checked with the command
> if (sum(trial) == 3) 1 else 0
which returns a 1 for success, and 0, otherwise.
For the actual simulation, the commands are repeated times in a loop. The output from each trial is stored in the vector simlist
. This vector will consist of ones and zeros corresponding to success or failure for each trial, where success is flipping three heads.
Finally, after repeating trials, we find the proportion of successes in all the trials, which is the proportion of ones in simlist
. Given a list of zeros and ones, the average, or mean, of the list is precisely the proportion of ones in the list. The command mean(simlist)
finds this average giving the simulated probability of getting three heads.
Run the script via the script file or R supplement to see that the resulting estimate is fairly close to the exact solution Increase to 100,000 or even a million to get more precise estimates.
Reproducibility. If you and a classmate both ran the code above exactly as presented, you would get different estimates of the probability via the simulation. This is due to having different random numbers in your simulations. Some readers may know that computer random number generators are really only pseudorandom. The computer is using an algorithm that generates numbers which behave like random numbers would. It turns out that this is actually a plus when writing code or doing simulations in the sense that you can make the computer regenerate the same random numbers by “setting a seed.” You can think of a seed as telling the computer where to start the algorithm for generating the random numbers. Then, if anyone else runs the code (including the seed), they would get the same result that you did with that seed (provided they have the same software and version). In R this is accomplished with the command
> set.seed(360)
where any number can be used as the input. We will use a seed of 360 unless otherwise specified and will not show this command in the text (though it is shown in the supplements and scripts), but it is used before every script where random numbers are generated. Have fun and set seeds using numbers you enjoy!
The reason that setting seeds is important is that this makes the work reproducible. Reproducibility means that others can take the code and obtain the same results, lending credibility to the work. While writing simulations, you should aim to make sure all your code is reproducible.
Example 1.33 The script Divisible356.R simulates the divisibility problem in Example 1.30 that a random integer from is divisible by 3, 5, or 6. The problem, and the resulting code, is more complex.The function simdivis() simulates one trial. Inside the function, the expression num%%x==0 checks whether num is divisible by . The if statement checks whether num is divisible by 3, 5, or 6, returning 1 if it is, and 0, otherwise.After defining the function, typing simdivis() will simulate one trial. By repeatedly typing simdivis() on your computer, you get a feel for how this random experiment behaves over repeated trials.In this script, instead of writing a loop, we use the replicate command. This powerful R command is an alternative to writing loops for simple expressions. The syntax is replicate(n,expr) . The expression expr is repeated times creating an -element vector. Thus, the result of typing> simlist <- replicate(1000, simdivis())is a vector of 1000 ones and zeros stored in the variable simlist corresponding to success or failure in the divisibility experiment. The average mean(simlist) gives the simulated probability.Play with this script. Based on 1000 trials, you might guess that the true probability is between 0.45 and 0.49. Increase the number of trials to 10,000 and the estimates are roughly between 0.46 and 0.48. At 100,000, the estimates become even more precise between 0.465 and 0.468.We can actually quantify this increase in precision in Monte Carlo simulation as gets large. But that is a topic that will have to wait until Chapter 11.