Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 132

6 Workflow

We have presented tools for determining when to stop a Monte Carlo simulation. The workflow starts by identifying and and then running a chosen sampler for some small iterations. Preliminary estimates of and or are obtained along with visualizations determining quality of the sampler. The simulation continues until a chosen stopping rule indicates termination using a prespecified . In the following section, we present three examples where we demonstrate this workflow.

In our examples, we assume that a CLT (or asymptotic distribution) for Monte Carlo estimators exists. However, extra care must be taken when working with a generic Monte Carlo procedure. Particularly, importance sampling can often yield estimators with infinite variances, where a CLT cannot hold. See Refs [3, 4] for more details. A CLT is particularly difficult to establish for MCMC due to serial correlation in the Markov chain. However, many individual Markov chains have been shown to be at least polynomially ergodic, for examples, see Jarner and Hansen [30], Roberts and Tweedie [31], Vats [32], Khare and Hobert [33], Tan et al. [34], Hobert and Geyer [35], Jones and Hobert [36].

A similar workflow can be adopted for embarrassingly parallel implementations of Monte Carlo samplers. Given the power of the modern personal computer, most Monte Carlo samplers can run on multiple cores simultaneously, producing more samples in the same clock time. For IID Monte Carlo, averaging estimators across all independent runs is reasonable. However, for estimating in MCMC, estimation quality can be improved by sharing information across multiple runs at the end of the simulation, see Gupta and Vats [37] for more details.

Sequential stopping rules, particularly in MCMC, should not be implemented as a black‐box procedure. Each implementation of the stopping rule must be accompanied with visualizations that give qualitative insights about the quality of the samplers. A better quality sampler can significantly improve estimation and lead to smaller run times. We illustrate this point by comparing samplers in our examples.

Computational Statistics in Data Science

Подняться наверх