Читать книгу Informatics and Machine Learning - Stephen Winters-Hilt - Страница 13

1.2 Informatics and Data Analytics

It is common to need to acquire a signal where the signal properties are not known, or the signal is only suspected and not discovered yet, or the signal properties are known but they may be too much trouble to fully enumerate. There is no common solution, however, to the acquisition task. For this reason the initial phases of acquisition methods unavoidably tend to be ad hoc. As with data dependency in non‐evolutionary search metaheuristics (where there is no optimal search method that is guaranteed to always work well), here there is no optimal signal acquisition method known in advance. In what follows methods are described for bootstrap optimization in signal acquisition to enable the most general‐use, almost “common,” solution possible. The bootstrap algorithmic method involves repeated passes over the data sequence, with improved priors, and trained filters, among other things, to have improved signal acquisition on subsequent passes. The signal acquisition is guided by statistical measures to recognize anomalies. Informatics methods and information theory measures are central to the design of a good finite state automata (FSAs) acquisition method, and will be reviewed in signal acquisition context in Chapters 2–4. Code examples are given in Python and C (with introductory Python described in Chapter 2 and Appendix A). Bootstrap acquisition methods may not automatically provide a common solution, but appear to offer a process whereby a solution can be improved to some desirable level of general‐data applicability.

The signal analysis and pattern recognition methods described in this book are mainly applied to problems involving stochastic sequential data: power signals and genomic sequences in particular. The information modeling, feature selection/extraction, and feature‐vector discrimination, however, were each developed separately in a general‐use context. Details on the theoretical underpinnings are given in Chapter 3, including a collection of ab initio information theory tools to help “find your way around in the dark.” One of the main ab initio approaches is to search for statistical anomalies using information measures, so various information measures will be described in detail [103–115].

The background on information theory and variational/statistical modeling has significant roots in variational calculus. Chapter 3 describes information theory ideas and the information “calculus” description (and related anomaly detection methods). The involvement of variational calculus methods and the possible parallels with the nascent development of a new (modern) “calculus of information” motivates the detailed overview of the highly successful physics development/applications of the calculus of variations (Appendix B). Using variational calculus, for example, it is possible to establish a link between a choice of information measure and statistical formalism (maximum entropy, Section 3.1). Taking the maximum entropy on a distribution with moment constraints leads to the classic distributions seen in mathematics and nature (the Gaussian for fixed mean and variance, etc.). Not surprisingly, variational methods also help to establish and refine some of the main ML methods, including Neural Nets (NNs) (Chapters 9, 13) and Support Vector Machines (SVM) (Chapter 10). SVMs are the main tool presented for both classification (supervised learning) and clustering (unsupervised learning), and everything in between (such as bag learning).

Подняться наверх