Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 133

7 Examples 7.1 Action Figure Collector Problem

Consider the general coupon collector problem [38] where the goal is to collect distinct objects (e.g., coupons, trading cards, and action figures). Specifically, independent draws of size are made from with replacement, and interest is in the number of draws necessary, say , to draw all objects at least once. The classical case where and all objects are equally likely yields a closed‐form solution (related to random sampling of digits). We consider a variation where and action figures appear in cereal boxes with probabilities in Table 1.

We estimate the expected number of boxes needed to collect all 15 action figures and the probability we needed to buy more than 100 and 200 total boxes. Denote these as , , and , respectively. Additionally, we implement an absolute precision sequential stopping rule to simulate until 95% confidence interval lengths for the three quantities of interest are below 1, 0.01, and 0.01, respectively. Specifically, we set and simulate an additional 100 Monte Carlo sample between checking the stopping rule. The sequential stopping rule terminates at with estimates of . We note that stopping is based on since its 95% confidence interval criteria is the most restrictive. The left panel of Figure 1 provides a histogram of the Monte Carlo samples along with vertical bold lines corresponding to 100 and 200 boxes.

Table 1 Probabilities for each action figure

Figures	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
Probability	0.2	0.1	0.1	0.1	0.1	0.1	0.05	0.05	0.05	0.05	0.02	0.02	0.02	0.02	0.02

Figure 1 Histograms of simulated boxes and mean number of boxes for two Monte Carlo sampling strategies in the collector problem.

A more efficient Monte Carlo experiment is available if we only wish to estimate . Suppose that is the set of all permutations of the set representing the order in which the action figures were collected. Then, for any , we can calculate and notice

This calculation is unavailable since there are over 3 trillion partitions in . However, we can simulate equally likely permutations from and estimate with

Using this sampler, we simulate until the 95% confidence interval length for is below 1. Again, we set and simulate an additional 100 Monte Carlo sample between checking the stopping rule. Now the sequential stopping rule terminates at with an estimate of 116.1, which is approximately 10 times more efficient than the naive Monte Carlo sampling. The right panel of Figure 1 provides a histogram of the Monte Carlo simulated means.

Computational Statistics in Data Science

Подняться наверх