Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 29

References

1 1 Davenport, T.H. and Patil, D. (2012) Data scientist. Harvard Bus. Rev., 90, 70–76.
2 2 Google Trends (2020) Data source: Google trends. https://trends.google.com/trends (accessed 12 July 2020).
3 3 American Statistical Association (2020) Statistics Degrees Total and By Gender, https://ww2.amstat.org/misc/StatTable1987-Current.pdf (accessed 01 June 2020).
4 4 Cleveland, W.S. (2001) Data science: an action plan for expanding the technical areas of the field of statistics. Int. Stat. Rev., 69, 21–26.
5 5 Donoho, D. (2017) 50 Years of data science. J. Comput. Graph. Stat., 26, 745–766.
6 6 Fisher, R.A. (1936) Design of experiments. Br Med J 1.3923, 554–554.
7 7 Fisher, R.A. (1992) Statistical methods for research workers, in Kotz S., Johnson N.L. (eds) Breakthroughs in Statistics, Springer Series in Statistics (Perspectives in Statistics). Springer, New York, NY. (Especially Section 21.02). doi: 10.1007/978-1-4612-4380-9_6.
8 8 Wald, A. and Wolfowitz, J. (1944) Statistical tests based on permutations of the observations. Ann. Math. Stat., 15, 358–372.
9 9 Efron B. (1992) Bootstrap methods: another look at the jackknife, in Breakthroughs in Statistics. Springer Series in Statistics (Perspectives in Statistics) (eds S. Kotz and N.L. Johnson), Springer, New York, NY, pp. 569–593. doi: 10.1007/978-1-4612-4380-9_41.
10 10 Efron, B. and Tibshirani, R.J. (1994) An Introduction to the Bootstrap, CRC press.
11 11 Bliss, C.I. (1935) The comparison of dosage‐mortality data. Ann. Appl. Biol., 22, 307–333 (Fisher introduces his scoring method in appendix).
12 12 McCullagh, P. and Nelder, J. (1989) Generalized Linear Models, 2nd edn, Chapman and Hall, London. Standard book on generalized linear models.
13 13 Tierney, L. (1994) Markov chains for exploring posterior distributions. Ann. Stat., 22, 1701–1728.
14 14 Brooks, S., Gelman, A., Jones, G., and Meng, X.‐L. (2011) Handbook of Markov Chain Monte Carlo, CRC press.
15 15 Chavan, V. and Phursule, R.N. (2014) Survey paper on big data. Int. J. Comput. Sci. Inf. Technol., 5, 7932–7939.
16 16 Williams, C.K. and Rasmussen, C.E. (1996) Gaussian processes for regression. Advances in Neural Information Processing Systems, pp. 514–520.
17 17 Williams, C.K. and Rasmussen, C.E. (2006) Gaussian Processes for Machine Learning, vol. 2, MIT press, Cambridge, MA.
18 18 Gelman, A., Carlin, J.B., Stern, H.S. et al. (2013) Bayesian Data Analysis, CRC press.
19 19 Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N. et al. (1953) Equation of state calculations by fast computing machines. J. Chem. Phys., 21, 1087–1092.
20 20 Hastings, W.K. (1970) Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1), 97–109. doi: 10.1093/biomet/57.1.97
21 21 Holbrook, A.J., Lemey, P., Baele, G. et al. (2020) Massive parallelization boosts big Bayesian multidimensional scaling. J. Comput. Graph. Stat., 1–34.
22 22 Holbrook, A.J., Loeffler, C.E., Flaxman, S.R. et al. (2021) Scalable Bayesian inference for self‐excitatory stochastic processes applied to big American gunfire data, Stat. Comput. 31, 4.
23 23 Seber, G.A. and Lee, A.J. (2012) Linear Regression Analysis, vol. 329, John Wiley & Sons.
24 24 Trefethen, L.N. and Bau, D. (1997) Numerical linear algebra. Soc. Ind. Appl. Math.
25 25 Gelman, A., Roberts, G.O., and Gilks, W.R. (1996) Efficient metropolis jumping rules. Bayesian Stat., 5, 42.
26 26 Van Dyk, D.A. and Meng, X.‐L. (2001) The art of data augmentation. J. Comput. Graph. Stat., 10, 1–50.
27 27 Neal, R.M. (2011) MCMC using Hamiltonian dynamics, in Handbook of Markov Chain Monte Carlo (eds S. Brooks, A. Gelman, G. Jones and X.L. Meng), Chapman and Hall/CRC Press, 113–162.
28 28 Holbrook, A., Vandenberg‐Rodes, A., Fortin, N., and Shahbaba, B. (2017) A Bayesian supervised dual‐dimensionality reduction model for simultaneous decoding of LFP and spike train signals. Stat, 6, 53–67.
29 29 Bouchard‐Côté, A., Vollmer, S.J., and Doucet, A. (2018) The bouncy particle sampler: a nonreversible rejection‐free Markov chain Monte Carlo method. J. Am. Stat. Assoc., 113, 855–867.
30 30 Murty, K.G. and Kabadi, S.N. (1985) Some NP‐Complete Problems in Quadratic and Nonlinear Programming. Tech. Rep.
31 31 Kennedy, J. and Eberhart, R. (1995) Particle Swarm Optimization. Proceedings of ICNN'95‐International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE.
32 32 Davis, L. (1991) Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York.
33 33 Hunter, D.R. and Lange, K. (2004) A tutorial on MM algorithms. Am. Stat., 58, 30–37.
34 34 Boyd, S., Boyd, S.P., and Vandenberghe, L. (2004) Convex Optimization, Cambridge University Press.
35 35 Fisher, R.A. (1922) On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. London, Ser. A, 222,309–368.
36 36 Beale, E., Kendall, M., and Mann, D. (1967) The discarding of variables in multivariate analysis. Biometrika, 54, 357–366.
37 37 Hocking, R.R. and Leslie, R. (1967) Selection of the best subset in regression analysis. Technometrics, 9, 531–540.
38 38 Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, 58,267–288.
39 39 Geyer, C. (1991) Markov Chain Monte Carlo Maximum Likelihood. Computing Science and Statistics: Proceedings of 23rd Symposium on the Interface Interface Foundation, Fairfax Station, 156–163.
40 40 Tjelmeland, H. and Hegstad, B.K. (2001) Mode jumping proposals in MCMC. Scand. J. Stat., 28, 205–223.
41 41 Lan, S., Streets, J., and Shahbaba, B. (2014) Wormhole Hamiltonian Monte Carlo. Twenty‐Eighth AAAI Conference on Artificial Intelligence.
42 42 Nishimura, A. and Dunson, D. (2016) Geometrically tempered Hamiltonian Monte Carlo. arXiv preprint arXiv:1604.00872.
43 43 Mitchell, T.J. and Beauchamp, J.J. (1988) Bayesian variable selection in linear regression. J. Am. Stat. Assoc., 83, 1023–1032.
44 44 Madigan, D. and Raftery, A.E. (1994) Model selection and accounting for model uncertainty in graphical models using Occam's window. J. Am. Stat. Assoc., 89, 1535–1546.
45 45 George, E.I. and McCulloch, R.E. (1997) Approaches for Bayesian variable selection. Statistica Sinica, 7, 339–373.
46 46 Hastie, T., Tibshirani, R., and Wainwright, M. (2015) Statistical Learning with Sparsity: The Lasso and Generalizations, CRC Press.
47 47 Friedman, J., Hastie, T., and Tibshirani, R. (2010) Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw., 33, 1.
48 48 Bhattacharya, A., Chakraborty, A., and Mallick, B.K. (2016) Fast sampling with Gaussian scale mixture priors in high‐dimensional regression. Biometrika, 103, 985–991.
49 49 Suchard, M.A., Schuemie, M.J., Krumholz, H.M. et al. (2019) Comprehensive comparative effectiveness and safety of first‐line antihypertensive drug classes: a systematic, multinational, large‐scale analysis. The Lancet, 394, 1816–1826.
50 50 Passos, I.C., Mwangi, B., and Kapczinski, F. (2019) Personalized Psychiatry: Big Data Analytics in Mental Health, Springer.
51 51 Svensson, V., da Veiga Beltrame, E., and Pachter, L. (2019) A curated database reveals trends in single‐cell transcriptomics. bioRxiv 742304.
52 52 Nott, D.J. and Kohn, R. (2005) Adaptive sampling for Bayesian variable selection. Biometrika, 92, 747–763.
53 53 Ghosh, J. and Clyde, M.A. (2011) Rao–Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach. J. Am. Stat. Assoc., 106,1041–1052.
54 54 Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) The horseshoe estimator for sparse signals. Biometrika, 97,465–480.
55 55 Polson, N.G. and Scott, J.G. (2010) Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Stat., 9, 501–538.
56 56 Polson, N.G., Scott, J.G., and Windle, J. (2013) Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc., 108, 1339–1349.
57 57 Nishimura, A. and Suchard, M.A. (2018) Prior‐preconditioned conjugate gradient for accelerated gibbs sampling in “large n & large p” sparse Bayesian logistic regression models. arXiv:1810.12437.
58 58 Rue, H. and Held, L. (2005) Gaussian Markov Random Fields: Theory and Applications, CRC Press.
59 59 Hestenes, M.R. and Stiefel, E. (1952) Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand., 49, 409–436.
60 60 Lanczos, C. (1952) Solution of systems of linear equations by minimized iterations. J. Res. Nat. Bur. Stand., 49, 33–53.
61 61 Van der Vorst, H.A. (2003) Iterative Krylov Methods for Large Linear Systems, vol. 13, Cambridge University Press.
62 62 Cipra, B.A. (2000) The best of the 20th century: editors name top 10 algorithms. SIAM News, 33, 1–2.
63 63 Dongarra, J., Heroux, M.A., and Luszczek, P. (2016) High‐performance conjugate‐gradient benchmark: a new metric for ranking high‐performance computing systems. Int. J. High Perform. Comput. Appl., 30, 3–10.
64 64 Zhang, L., Zhang, L., Datta, A., and Banerjee, S. (2019) Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments. Stat. Anal. Data Min., 12, 197–209.
65 65 Golub, G.H. and Van Loan, C.F. (2012) Matrix Computations, vol. 3, Johns Hopkins University Press.
66 66 Pybus, O.G., Tatem, A.J., and Lemey, P. (2015) Virus evolution and transmission in an ever more connected world. Proc. R. Soc. B: Biol. Sci., 282, 20142878.
67 67 Bloom, D.E., Black, S., and Rappuoli, R. (2017) Emerging infectious diseases: a proactive approach. Proc. Natl. Acad. Sci. U.S.A., 114, 4055–4059.
68 68 Pybus, O.G., Suchard, M.A., Lemey, P. et al. (2012) Unifying the spatial epidemiology and molecular evolution of emerging epidemics. Proc. Natl. Acad. Sci. U.S.A., 109, 15066–15071.
69 69 Nunes, M.R., Palacios, G., Faria, N.R. et al. (2014) Air travel is associated with intracontinental spread of dengue virus serotypes 1–3 in Brazil. PLoS Negl. Trop. Dis., 8, e2769.
70 70 Bletsa, M., Suchard, M.A., Ji, X. et al. (2019) Divergence dating using mixed effects clock modelling: an application to HIV‐1. Virus Evol., 5, vez036.
71 71 Dudas, G., Carvalho, L.M., Bedford, T. et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature, 544, 309–315.
72 72 Elbe, S. and Buckland‐Merrett, G. (2017) Data, disease and diplomacy: GISAID's innovative contribution to global health. Glob. Chall., 1, 33–46.
73 73 Ji, X., Zhang, Z., Holbrook, A. et al. (2020) Gradients do grow on trees: a linear‐time O(N)‐dimensional gradient for statistical phylogenetics. Mol. Biol. Evol., 37, 3047–3060.
74 74 Baum, L. (1972) An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities, 3, 1–8.
75 75 Suchard, M.A., Lemey, P., Baele, G. et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol., 4, vey016.
76 76 Gentle, J.E., Härdle, W.K., and Mori, Y. (eds) (2012) How computational statistics became the backbone of modern data science, in Handbook of Computational Statistics, Springer, pp. 3–16.
77 77 Lunn, D., Spiegelhalter, D., Thomas, A., and Best, N. (2009) The BUGS project: evolution, critique and future directions. Stat. Med., 28, 3049–3067.
78 78 Bergstra, J., Breuleux, O., Bastien, F. et al. (2010) Theano: A CPU and GPU Math Expression Compiler. Proceedings of the Python for Scientific Computing Conference (SciPy) Oral Presentation.
79 79 Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986) Learning representations by back‐propagating errors. Nature, 323, 533–536.
80 80 Neal, R.M. (1996) Bayesian Learning for Neural Networks, Springer‐Verlag.
81 81 Gelman, A. (2014) Petascale Hierarchical Modeling Via Parallel Execution. U.S. Department of Energy. Report No: DE‐SC0002099.
82 82 Hoffman, M.D. and Gelman, A. (2014) The no‐U‐turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 15, 1593–1623.
83 83 Stan Development Team (2018) Stan Modeling Language Users Guide and Reference Manual. Version 2.18.0.
84 84 Livingstone, S. and Zanella, G. (2019) On the robustness of gradient‐based MCMC algorithms. arXiv:1908.11812.
85 85 Mangoubi, O., Pillai, N.S., and Smith, A. (2018) Does Hamiltonian Monte Carlo mix faster than a random walk on multimodal densities? arXiv:1808.03230.
86 86 Livingstone, S., Faulkner, M.F., and Roberts, G.O. (2019) Kinetic energy choice in Hamiltonian/hybrid Monte Carlo. Biometrika, 106, 303–319.
87 87 Dinh, V., Bilge, A., Zhang, C., and Matsen IV, F.A. (2017) Probabilistic Path Hamiltonian Monte Carlo. Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1009–1018.
88 88 Nishimura, A., Dunson, D.B., and Lu, J. (2020) Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods. Biometrika, 107, 365–380.
89 89 Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., PAMI‐6, 721–741.
90 90 Gelfand, A.E. and Smith, A.F. (1990) Sampling‐based approaches to calculating marginal densities. J. Am. Stat. Assoc., 85, 398–409.
91 91 Monnahan, C.C., Thorson, J.T., and Branch, T.A. (2017) Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods Ecol. Evol., 8, 339–348.
92 92 Zhang, Z., Zhang, Z., Nishimura, A. et al. (2020) Large‐scale inference of correlation among mixed‐type biological traits with phylogenetic multivariate probit models. Ann. Appl. Stat.
93 93 Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, 39, 1–22.
94 94 Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., and Saul, L.K. (1999) An introduction to variational methods for graphical models. Mach. Learn., 37, 183–233.
95 95 Wei, G.C. and Tanner, M.A. (1990) A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. J. Am. Stat. Assoc., 85, 699–704.
96 96 Ranganath, R., Gerrish, S., and Blei, D.M. (2014) Black Box Variational Inference. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics.
97 97 Dagum, L. and Menon, R. (1998) OpenMP: an industry standard API for shared‐memory programming. IEEE Comput. Sci. Eng., 5, 46–55.
98 98 Warne, D.J., Sisson, S.A., and Drovandi, C. (2019) Acceleration of expensive computations in Bayesian statistics using vector operations. arXiv preprint arXiv:1902.09046.
99 99 Bergstra, J., Bastien, F., Breuleux, O. et al. (2011) Theano: Deep Learning on GPUS with Python. NIPS 2011, BigLearning Workshop, Granada, Spain vol. 3, pp. 1–48. Citeseer.
100 100 Nielsen, M.A. and Chuang, I. (2002) Quantum computation and quantum information, Cambridge University Press.
101 101 Grover, L.K. (1996) A Fast Quantum Mechanical Algorithm for Database Search. Proceedings of the Twenty‐Eighth Annual ACM Symposium on Theory of Computing, pp. 212–219.
102 102 Boyer, M., Brassard, G., Høyer, P., and Tapp, A. (1998) Tight bounds on quantum searching. Fortschritte der Physik: Progress of Physics, 46, 493–505.
103 103 Jordan, S.P. (2005) Fast quantum algorithm for numerical gradient estimation. Phys. Rev. Lett., 95, 050501.
104 104 Harrow, A.W., Hassidim, A., and Lloyd, S. (2009) Quantum algorithm for linear systems of equations. Phys. Rev. Lett., 103, 150502.
105 105 Aaronson, S. (2015) Read the fine print. Nat. Phys., 11, 291–293.
106 106 COPSS (2020) Committee of Presidents of Statistical Societies, https://community.amstat.org/copss/awards/winners (accessed 31 August 2020).
107 107 Wickham, H. (2007) Reshaping data with the reshape package. J. Stat. Soft., 21, 1–20.
108 108 Wickham, H. (2011) The split‐apply‐combine strategy for data analysis. J. Stat. Soft., 40, 1–29.
109 109 Wickham, H. (2014) Tidy data. J. Stat. Soft., 59, 1–23.
110 110 Kahle, D. and Wickham, H. (2013) ggmap: spatial visualization with ggplot2. R J., 5, 144–161.
111 111 Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis, Springer.

Computational Statistics in Data Science

Подняться наверх