Informatics and Machine Learning
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Stephen Winters-Hilt. Informatics and Machine Learning
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Informatics and Machine Learning. From Martingales to Metaheuristics
Preface
1 Introduction
1.1 Data Science: Statistics, Probability, Calculus … Python (or Perl) and Linux
1.2 Informatics and Data Analytics
1.3 FSA‐Based Signal Acquisition and Bioinformatics
1.4 Feature Extraction and Language Analytics
1.5 Feature Extraction and Gene Structure Identification
1.5.1 HMMs for Analysis of Information Encoding Molecules
1.5.2 HMMs for Cheminformatics and Generic Signal Analysis
1.6 Theoretical Foundations for Learning
1.7 Classification and Clustering
1.8 Search
1.9 Stochastic Sequential Analysis (SSA) Protocol (Deep Learning Without NNs)
1.9.1 Stochastic Carrier Wave (SCW) Analysis – Nanoscope Signal Analysis
1.9.2 Nanoscope Cheminformatics – A Case Study for Device “Smartening”
1.10 Deep Learning using Neural Nets
1.11 Mathematical Specifics and Computational Implementations
2 Probabilistic Reasoning and Bioinformatics
2.1 Python Shell Scripting
2.1.1 Sample Size Complications
2.2 Counting, the Enumeration Problem, and Statistics
2.3 From Counts to Frequencies to Probabilities
2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics
2.5 Statistics, Conditional Probability, and Bayes' Rule
2.5.1 The Calculus of Conditional Probabilities: The Cox Derivation
2.5.2 Bayes' Rule
2.5.3 Estimation Based on Maximal Conditional Probabilities
2.6 Emergent Distributions and Series
2.6.1 The Law of Large Numbers (LLN)
2.6.2 Distributions. 2.6.2.1 The Geometric Distribution(Emergent Via Maxent)
2.6.2.2 The Gaussian (aka Normal) Distribution (Emergent Via LLN Relation and Maxent)
2.6.2.3 Significant Distributions That Are Not Gaussian or Geometric
2.6.3 Series
2.7 Exercises
3 Information Entropy and Statistical Measures
3.1 Shannon Entropy, Relative Entropy, Maxent, Mutual Information
3.1.1 The Khinchin Derivation
3.1.2 Maximum Entropy Principle
3.1.3 Relative Entropy and Its Uniqueness
3.1.4 Mutual Information
3.1.5 Information Measures Recap
3.2 Codon Discovery from Mutual Information Anomaly
3.3 ORF Discovery from Long‐Tail Distribution Anomaly
3.3.1 Ab initio Learning with smORF’s, Holistic Modeling, and Bootstrap Learning
3.4 Sequential Processes and Markov Models
3.4.1 Markov Chains
3.5 Exercises
4 Ad Hoc, Ab Initio, and Bootstrap Signal Acquisition Methods
4.1 Signal Acquisition, or Scanning, at Linear Order Time‐Complexity
4.2 Genome Analytics: The Gene‐Finder
4.3 Objective Performance Evaluation: Sensitivity and Specificity
4.4 Signal Analytics: The Time‐Domain Finite State Automaton (tFSA)
4.4.1 tFSA Spike Detector
4.4.2 tFSA‐Based Channel Signal Acquisition Methods with Stable Baseline
4.4.3 tFSA‐Based Channel Signal Acquisition Methods Without Stable Baseline
4.5 Signal Statistics (Fast): Mean, Variance, and Boxcar Filter
4.5.1 Efficient Implementations for Statistical Tools (O(L))
4.6 Signal Spectrum: Nyquist Criterion, Gabor Limit, Power Spectrum
4.6.1 Nyquist Sampling Theorem
4.6.2 Fourier Transforms, and Other Classic Transforms
4.6.3 Power Spectral Density
4.6.4 Power‐Spectrum‐Based Feature Extraction
4.6.5 Cross‐Power Spectral Density
4.6.6 AM/FM/PM Communications Protocol
4.7 Exercises
5 Text Analytics
5.1 Words
5.1.1 Text Acquisition: Text Scraping and Associative Memory. 5.1.1.1 Kiwix, Gutenberg Project, Wikipedia
5.1.1.2 Library of Babel
5.1.1.3 Weather Scraper
5.1.1.4 Stock Scraper – New‐Style with Cookies
5.1.2 Word Frequency Analysis: Machiavelli’s Polysemy on Fortuna and Virtu
5.1.3 Word Frequency Analysis: Coleridge’s Hidden Polysemy on Logos
5.1.4 Sentiment Analysis
5.2 Phrases – Short (Three Words)
5.2.1 Shakespearean Insult Generation – Phrase Generation
5.3 Phrases – Long (A Line or Sentence)
5.3.1 Iambic Phrase Analysis: Shakespeare
5.3.2 Natural Language Processing
5.3.3 Sentence and Story Generation: Tarot
5.4 Exercises
6 Analysis of Sequential Data Using HMMs
6.1 Hidden Markov Models (HMMs) 6.1.1 Background and Role in Stochastic Sequential Analysis (SSA)
6.1.2 When to Use a Hidden Markov Model (HMM)?
6.1.3 Hidden Markov Models (HMMs) – Standard Formulation and Terms
6.2 Graphical Models for Markov Models and Hidden Markov Models
6.2.1 Hidden Markov Models
6.2.2 Viterbi Path
6.2.2.1 The Most Probable State Sequence
6.2.3 Forward and Backward Probabilities
6.2.4 HMM: Maximum Likelihood discrimination
6.2.5 Expectation/Maximization (Baum–Welch)
6.2.5.1 Emission and Transition Expectations with Rescaling
6.3 Standard HMM Weaknesses and their GHMM Fixes
6.4 Generalized HMMs (GHMMs – “Gems”): Minor Viterbi Variants. 6.4.1 The Generic HMM
6.4.2 pMM/SVM
6.4.3 EM and Feature Extraction via EVA Projection
6.4.4 Feature Extraction via Data Absorption (a.k.a. Emission Inversion)
6.4.5 Modified AdaBoost for Feature Selection and Data Fusion
The AdaBoost algorithm
6.4.5.1 The Modified Adaboost Algorithm for Feature Selection
6.4.5.2 Modified Adaboost in SSA Protocol
6.5 HMM Implementation for Viterbi (in C and Perl)
6.6 Exercises
7 Generalized HMMs (GHMMs): Major Viterbi Variants
7.1 GHMMs: Maximal Clique for Viterbi and Baum–Welch
7.2 GHMMs: Full Duration Model. 7.2.1 HMM with Duration (HMMD)
7.2.2 Hidden Semi‐Markov Models (HSMM) with sid‐information
7.2.3 HMM with Binned Duration (HMMBD)
7.3 GHMMs: Linear Memory Baum–Welch Algorithm
7.4 GHMMs: Distributable Viterbi and Baum–Welch Algorithms. 7.4.1 Distributed HMM processing via “Viterbi‐overlap‐chunking” with GPU speedup
7.4.2 Relative Entropy and Viterbi Scoring
7.5 Martingales and the Feasibility of Statistical Learning (further details in Appendix)
7.6 Exercises
8 Neuromanifolds and the Uniqueness of Relative Entropy. 8.1 Overview
8.2 Review of Differential Geometry [206, 207]
8.2.1 Differential Topology – Natural Manifold
8.2.2 Differential Geometry – Natural Geometric Structures
8.3 Amari’s Dually Flat Formulation [113–115]
8.3.1 Generalization of Pythagorean Theorem
8.3.2 Projection Theorem and Relation Between Divergence and Link Formalism
8.4 Neuromanifolds [113–115]
8.5 Exercises
9 Neural Net Learning and Loss Bounds Analysis
9.1 Brief Introduction to Neural Nets (NNs)
9.1.1 Single Neuron Discriminator
9.1.1.1 The Perceptron
9.1.1.2 Sigmoid Neurons
9.1.1.3 The Loss Function and Gradient Descent
9.1.2 Neural Net with Back‐Propagation
9.1.2.1 The Loss Function – General Activation in a General Neural Net
9.2 Variational Learning Formalism and Use in Loss Bounds Analysis
9.2.1 Variational Basis for Update Rule
9.2.2 Review and Generalization of GD Loss Bounds Analysis [213, 214]
9.2.3 Review of the EG Loss Bounds Analysis
9.3 The “sinh−1(ω)” link algorithm (SA) 9.3.1 Motivation for “sinh−1(ω)” link algorithm (SA)
9.3.2 Relation of sinh Link Algorithm to the Binary Exponentiated Gradient Algorithm
9.4 The Loss Bounds Analysis for sinh−1(ω)
9.4.1 Loss Bounds Analysis Using the Taylor Series Approach
9.4.2 Loss Bounds Analysis Using Taylor Series for the sinh Link (SA) Algorithm
9.5 Exercises
10 Classification and Clustering
10.1 The SVM Classifier – An Overview
10.2 Introduction to Classification and Clustering
10.2.1 Sum of Squared Error (SSE) Scoring
10.2.2 K‐Means Clustering (Unsupervised Learning)
10.2.3 k‐Nearest Neighbors Classification (Supervised Learning)
10.2.4 The Perceptron Recap (See Chapter 9 for Details)
10.3 Lagrangian Optimization and Structural Risk Minimization (SRM) 10.3.1 Decision Boundary and SRM Construction Using Lagrangian
10.3.2 The Theory of Classification
10.3.3 The Mathematics of the Feasibility of Learning
10.3.3.1 The Hoeffding Inequality
10.3.3.2 Hoeffding Inequality is Related to Chebyshev Inequality
10.3.3.3 Sample Error
10.3.3.4 The Generalization Bound (Establishes First ML Law for |H| < ∞)
10.3.3.5 The VC Generalization Bound (Establishes First ML Law for |H| = ∞)
The VC Dimension and Generalization
VC Generalization Bound
10.3.4 Lagrangian Optimization
10.3.5 The Support Vector Machine (SVM) – Lagrangian with SRM
10.3.5.1 Kernel Modeling and Other Tuning
10.3.6 Kernel Construction Using Polarization
10.3.7 SVM Binary Classifier Derivation
10.4 SVM Binary Classifier Implementation
10.4.1 Sequential Minimal Optimization (SMO)
10.4.2 Alpha‐Selection Variants
10.4.3 Chunking on Large Datasets: O(N2) ➔ n O(N2/n2) = O(N2)/n
10.4.3.1 Distributed SVM Processing (Chunking)
10.4.3.2 SV/Non‐SV Pass‐Tuning on Train Subsets: An Outlier‐Management Heuristic
10.4.3.3 SV/Non‐SV Pass‐Tuning on (9AT,9TA) vs. (9CG,9GC)
10.4.4 Support Vector Reduction (SVR)
10.4.4.1 Multi‐Threaded Chunking with SVR
10.4.4.2 Multi‐Threaded Distributed Chunking with SVR
10.4.5 Code Examples (in OO Perl)
10.5 Kernel Selection and Tuning Metaheuristics. 10.5.1 The “Stability” Kernels
10.5.1.1 The Mercer Test
10.5.1.2 The Positive Principal Minors Test
10.5.2 Derivation of “Stability” Kernels
10.5.3 Entropic and Gaussian Kernels Relate to Unique, Minimally Structured, Information Divergence and Geometric Distance Measures
10.5.4 Automated Kernel Selection and Tuning
10.6 SVM Multiclass from Decision Tree with SVM Binary Classifiers
10.7 SVM Multiclass Classifier Derivation (Multiple Decision Surface)
10.7.1 Decomposition Method to Solve the Dual
10.7.2 SVM Speedup via Differentiating BSVs and SVs
10.8 SVM Clustering
10.8.1 SVM‐External Clustering
10.8.1.1 Single‐Convergence Initialized SVM‐Clustering: Exploration on Sensitivity to Tuning
10.8.1.2 Single‐Convergence SVM‐Clustering: Hybrid Clustering
10.8.2 Single‐Convergence SVM‐Clustering: Comparative Analysis
10.8.2.1 SVM “Internal” Clustering
10.8.2.2 Solving the Dual (Based on Keerthi’s SMO [184] )
10.8.2.3 Keerthi Algorithm
10.8.3 Stabilized, Single‐Convergence Initialized, SVM‐External Clustering
10.8.4 Stabilized, Multiple‐Convergence, SVM‐External Clustering
10.8.5 SVM‐External Clustering – Algorithmic Variants. 10.8.5.1 Multiple‐Convergence Initialized (Steepest Ascent) SVM‐Clustering
10.8.5.2 Projection Clustering – Clustering in Decision Space
10.8.5.3 SVM‐ABC
10.8.5.4 SVM‐Relabeler
10.8.5.5 SV‐Dropper
10.8.5.6 Rayleigh’s Criterion Clustering Algorithm
10.9 Exercises
11 Search Metaheuristics
11.1 Trajectory‐Based Search Metaheuristics
11.1.1 Optimal‐Fitness Configuration Trajectories – Fitness Function Known and Sufficiently Regular
11.1.1.1 Metaheuristic #1: Euler’s Method – First‐Order Gradient Ascent
11.1.1.2 Metaheuristic #2: Newton’s Method – Second‐Order Gradient Ascent
11.1.1.3 Metaheuristic #3: Gradient Ascent with (Random) Restart
11.1.2 Optimal‐Fitness Configuration Trajectories – Fitness Function not Known
11.1.2.1 Metaheuristic #4: (Blind) Hill Climbing
11.1.2.2 Metaheuristic #5: Steepest Ascent Hill Climbing
11.1.2.3 Metaheuristic #6: Steepest Ascent Hill Climbing with Restart
11.1.3 Fitness Configuration Trajectories with Nonoptimal Updates
Global Optimization
11.1.3.1 Metaheuristic #7: Simulated Annealing Hill Climbing
11.1.3.2 Metaheuristic #8: Simulated Annealing Hill Climbing with Random Restart
11.1.3.3 Metaheuristic #9: Taboo Search
11.1.3.4 Metaheuristic #10: Tabu Search with Restart
11.1.3.5 Metaheuristic #11: Component‐Based Tabu Search
11.1.3.6 Metaheuristic #12: Component‐Based Tabu Search with Restart
11.2 Population‐Based Search Metaheuristics
11.2.1 Population with Evolution
11.2.1.1 Metaheuristic #13: Evolutionary Optimization (Darwinian Evolution; Asexual Reproduction)
11.2.1.2 Metaheuristic #14: Genetic Algorithm (Darwinian Evolution; Sexual Reproduction – Binary Interaction)
11.2.1.3 Evolutionary Algorithm Parameters
11.2.2 Population with Group Interaction – Swarm Intelligence
11.2.2.1 Metaheuristic #15: Particle Swarm Optimization (PSO) (Lamarckian Evolution)
11.2.3 Population with Indirect Interaction via Artifact
11.2.3.1 Metaheuristic #16: Ant Colony Optimization (ACO) (Swarm Intelligence; Stygmergy; Have Coevolution with Artifact)
11.2.3.2 Other Population‐Based Search Metaheuristics
11.3 Exercises
12 Stochastic Sequential Analysis (SSA)
12.1 HMM and FSA‐Based Methods for Signal Acquisition and Feature Extraction
12.2 The Stochastic Sequential Analysis (SSA) Protocol
12.2.1 (Stage 1) Primitive Feature Identification
12.2.2 (Stage 2) Feature Identification and Feature Selection
12.2.2.1 Stochastic Carrier Wave Encoding/Decoding
12.2.3 (Stage 3) Classification
12.2.4 (Stage 4) Clustering
12.2.5 (All Stages) Database/Data‐Warehouse System Specification
12.2.6 (All Stages) Server‐Based Data Analysis System Specification
12.3 Channel Current Cheminformatics (CCC) Implementation of the Stochastic Sequential Analysis (SSA) Protocol
12.4 SCW for Detector Sensitivity Boosting
12.4.1 NTD with Multiple Channels (or High Noise)
12.4.2 Stochastic Carrier Wave
12.5 SSA for Deep Learning
12.6 Exercises
13 Deep Learning Tools – TensorFlow
13.1 Neural Nets Review
13.1.1 Summary of Single Neuron Discriminator
13.1.2 Summary of Neural Net Discriminator and Back‐Propagation
13.2 TensorFlow from Google
13.2.1 Installation/Setup
13.2.1.1 Tensors
13.2.2 Example: Character Recognition
13.2.2.1 Transfer Learning
13.2.2.2 Fine‐tuning
13.2.3 Example: Language Translation
13.2.4 TensorBoard and the TensorFlow Profiler
13.2.5 Tensor Cores
13.3 Exercises
14 Nanopore Detection – A Case Study
14.1 Standard Apparatus
14.1.1 Standard Operational and Physiological Buffer Conditions
14.1.2 α‐Hemolysin Channel Stability – Introduction of Chaotropes
14.2 Controlling Nanopore Noise Sources and Choice of Aperture
14.3 Length Resolution of Individual DNA Hairpins
14.4 Detection of Single Nucleotide Differences (Large Changes in Structure)
14.5 Blockade Mechanism for 9bphp
14.6 Conformational Kinetics on Model Biomolecules
14.7 Channel Current Cheminformatics. 14.7.1 Power Spectra and Standard EE Signal Analysis
14.7.2 Channel Current Cheminformatics for Single‐Biomolecule/Mixture Identifications
14.7.3 Channel Current Cheminformatics: Feature Extraction by HMM
14.7.4 Bandwidth Limitations
14.8 Channel‐Based Detection Mechanisms. 14.8.1 Partitioning and Translocation‐Based ND Biosensing Methods
14.8.2 Transduction Versus Translation
14.8.3 Single‐Molecule Versus Ensemble
14.8.4 Biosensing with High Sensitivity in Presence of Interference
14.8.5 Nanopore Transduction Detection Methods
14.8.5.1 Things to “Contact” with the Channel: Aptamers
14.8.5.2 Things to “Contact” with the Channel: Immunoglobulins
14.9 The NTD Nanoscope
14.9.1 Nanopore Transduction Detection (NTD)
14.9.1.1 Ponderable Media Flow Phenomenology and Related Information Flow Phenomenology
14.9.2 NTD: A Versatile Platform for Biosensing
14.9.3 NTD Platform
14.9.4 NTD Operation
14.9.5 Driven Modulations
14.9.6 Driven Modulations with Multichannel Augmentation
14.10 NTD Biosensing Methods
14.10.1 Model Biosensor Based on Streptavidin and Biotin
14.10.2 Model System Based on DNA Annealing. 14.10.2.1 Linear DNA Annealing Test
14.10.2.2 “Y” DNA Annealing Test
14.10.3 Y‐Aptamer with Use of Chaotropes to Improve Signal Resolution
14.10.4 Pathogen Detection, miRNA Detection, and miRNA Haplotyping
14.10.5 SNP Detection
14.10.6 Aptamer‐Based Detection
14.10.6.1 NaDir SELEX
14.10.7 Antibody‐Based Detection
14.10.7.1 Managing Antibodies as Easily Identifiable Interference or Transducer
14.10.7.2 Small Target Antibody‐Based Detection (Linked Modulator)
14.10.7.3 Large Target Antibody‐Based Detection
14.11 Exercises
Appendix A Python and Perl System Programming in Linux. A.1 Getting Linux and Python in a Flash (Drive)
A.2 Linux and the Command Shell
A.3 Perl Review: I/O, Primitives, String Handling, Regex
Appendix B Physics. B.1 The Calculus of Variations
Appendix C Math. C.1 Martingales [102] Martingale Definition
Induced Martingales with Markov Chains [102]
In HMM Learning Have Sequences of Likelihood Ratios, Which Is a Martingale, Proof
Supermartingales and Submartingales [102]
Martingale Convergence Theorems [102]
Theorem
“Maximal” Inequalities for Martingales [102]
Lemma 1
Lemma 2
Mean‐Square Convergence Theorem for Martingales [102]
Martingales w.r.t σ‐Field Formalism
Backwards Martingale Definition (w.r.t Sigma Sub‐fields)
Backwards Martingale Convergence Theorem
Strong Law of Large Numbers Proof
Stationary Processes
Strong Ergodic Theorem [102]
Asymptotic Equipartition Property (AEP)
De Finetti’s Theorem
C.2 Hoeffding Inequality
Hoeffding Lemma Proof
Hoeffding Inequality Proof (for Further Details, See [104] )
Chernoff Bounding Technique:
References
Index. a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
y
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Stephen Winters‐Hilt
.....
In standard band‐limited (and not time‐limited) signal analysis with periodic waveforms, sampling is done at the Nyquist rate to have a fully reproducible signal. If the sample information is needed elsewhere, it is then compressed (possibly lossy) and transmitted (a “smart encoder”). The received data is then decompressed and reconstructed (by simply summing wave components, e.g. a “simple” decoder). If the signal is sparse or compressible, then compressive sensing [190] can be used, where sampling and compression are combined into one efficient step to obtain compressive measurements (the simple encoding in [190] since a set of random projections are employed), which are then transmitted (general details on noise in this context are described in [191, 192]). On the receiving end, the decompression and reconstruction steps are, likewise, combined using an asymmetric “smart” decoding step. This progression toward asymmetric compressive signal processing can be taken a step further if we consider signal sequences to be equivalent if they have the same stationary statistics. What is obtained is a method similar to compressive sensing, but involving stationary‐statistics generative‐projection sensing, where the signal processing is non‐lossy at the level of stationary statistics equivalence. In the SCW signal analysis the signal source is generative in that it is describable via use of a HMM, and the HMM’s Viterbi‐derived generative projections are used to describe the sparse components contributing to the signal source. In SCW encoding the modulation of stationary statistics can be man‐made or natural, with the latter in many experimental situations involving a flow phenomenology that has stationary statistics. If the signal is man‐made, usually the underlying stochastic process is still a natural source, where it is the changes in the stationary statistics that is under the control of the man‐made encoding scheme. Transmission and reception are then followed by generative projection via Viterbi‐HMM template matching or via Viterbi‐HMM feature extraction followed by separate classification (using SVM). So in the SCW approach the encoding is even simpler (possibly non‐existent, other than directly passing quantized signal) and is applicable to any noise source with stationary statistics (e.g. a stationary signal with reproducible statistics, the case for many experimental observations). The decoding must be even “smarter,” on the other hand, in that generalized Viterbi algorithms are used, and possibly other ML methods as well, SVMs in particular. An example of the stationary statistics sensing with a ML‐based decoder is described in application to CCC studies in Chapter 14.
.....