Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 20
3.1.1 Continuous shrinkage: alleviating big M
ОглавлениеBayesian sparse regression, despite its desirable theoretical properties and flexibility to serve as a building block for richer statistical models, has always been relatively computationally intensive even before the advent of “big and big ” data [45, 52, 53]. A major source of its computational burden is severe posterior multimodality (big ) induced by the discrete binary nature of spike‐and‐slab priors (Section 2.3). The class of global–local continuous shrinkage priors is a more recent alternative to shrink s in a more continuous manner, thereby alleviating (if not eliminating) the multimodality issue [54, 55]. This class of prior is represented as a scale mixture of Gaussians:
The idea is that the global scale parameter would shrink most s toward zero, while the local scale s, with its heavy‐tailed prior , allow a small number of and hence s to be estimated away from zero. While motivated by two different conceptual frameworks, the spike‐and‐slab can be viewed as a subset of global–local priors in which is chosen as a mixture of delta masses placed at and . Continuous shrinkage mitigates the multimodality of spike‐and‐slab by smoothly bridging small and large values of .
On the other hand, the use of continuous shrinkage priors does not address the increasing computational burden from growing and in modern applications. Sparse regression posteriors under global–local priors are amenable to an effective Gibbs sampler, a popular class of MCMC we describe further in Section 4.1. Under the linear and logistic models, the computational bottleneck of this Gibbs sampler stems from the need for repeated updates of from its conditional distribution
where is an additional parameter of diagonal matrix and .5 Sampling from this high‐dimensional Gaussian distribution requires operations with the standard approach [58]: for computing the term and for Cholesky factorization of . While an alternative approach by Bhattacharya et al. [48] provides the complexity of , the computational cost remains problematic in the big and big regime at after choosing the faster of the two.