Informatics and Machine Learning

Informatics and Machine Learning
Автор книги: id книги: 2232568     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 13080,7 руб.     (142,52$) Читать книгу Купить и скачать книгу Электронная книга Жанр: Математика Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119716761 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Informatics and Machine Learning Discover a thorough exploration of how to use computational, algorithmic, statistical, and informatics methods to analyze digital data Informatics and Machine Learning: From Martingales to Metaheuristics delivers an interdisciplinary presentation on how analyze any data captured in digital form. The book describes how readers can conduct analyses of text, general sequential data, experimental observations over time, stock market and econometric histories, or symbolic data, like genomes. It contains large amounts of sample code to demonstrate the concepts contained within and assist with various levels of project work. The book offers a complete presentation of the mathematical underpinnings of a wide variety of forms of data analysis and provides extensive examples of programming implementations. It is based on two decades worth of the distinguished author’s teaching and industry experience. A thorough introduction to probabilistic reasoning and bioinformatics, including Python shell scripting to obtain data counts, frequencies, probabilities, and anomalous statistics, or use with Bayes’ rule An exploration of information entropy and statistical measures, including Shannon entropy, relative entropy, maximum entropy (maxent), and mutual information A practical discussion of ad hoc, ab initio , and bootstrap signal acquisition methods, with examples from genome analytics and signal analytics Perfect for undergraduate and graduate students in machine learning and data analytics programs, Informatics and Machine Learning: From Martingales to Metaheuristics will also earn a place in the libraries of mathematicians, engineers, computer scientists, and life scientists with an interest in those subjects.

Оглавление

Stephen Winters-Hilt. Informatics and Machine Learning

Table of Contents

List of Tables

List of Illustrations

Guide

Pages

Informatics and Machine Learning. From Martingales to Metaheuristics

Preface

1 Introduction

1.1 Data Science: Statistics, Probability, Calculus … Python (or Perl) and Linux

1.2 Informatics and Data Analytics

1.3 FSA‐Based Signal Acquisition and Bioinformatics

1.4 Feature Extraction and Language Analytics

1.5 Feature Extraction and Gene Structure Identification

1.5.1 HMMs for Analysis of Information Encoding Molecules

1.5.2 HMMs for Cheminformatics and Generic Signal Analysis

1.6 Theoretical Foundations for Learning

1.7 Classification and Clustering

1.8 Search

1.9 Stochastic Sequential Analysis (SSA) Protocol (Deep Learning Without NNs)

1.9.1 Stochastic Carrier Wave (SCW) Analysis – Nanoscope Signal Analysis

1.9.2 Nanoscope Cheminformatics – A Case Study for Device “Smartening”

1.10 Deep Learning using Neural Nets

1.11 Mathematical Specifics and Computational Implementations

2 Probabilistic Reasoning and Bioinformatics

2.1 Python Shell Scripting

2.1.1 Sample Size Complications

2.2 Counting, the Enumeration Problem, and Statistics

2.3 From Counts to Frequencies to Probabilities

2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics

2.5 Statistics, Conditional Probability, and Bayes' Rule

2.5.1 The Calculus of Conditional Probabilities: The Cox Derivation

2.5.2 Bayes' Rule

2.5.3 Estimation Based on Maximal Conditional Probabilities

2.6 Emergent Distributions and Series

2.6.1 The Law of Large Numbers (LLN)

2.6.2 Distributions. 2.6.2.1 The Geometric Distribution(Emergent Via Maxent)

2.6.2.2 The Gaussian (aka Normal) Distribution (Emergent Via LLN Relation and Maxent)

2.6.2.3 Significant Distributions That Are Not Gaussian or Geometric

2.6.3 Series

2.7 Exercises

3 Information Entropy and Statistical Measures

3.1 Shannon Entropy, Relative Entropy, Maxent, Mutual Information

3.1.1 The Khinchin Derivation

3.1.2 Maximum Entropy Principle

3.1.3 Relative Entropy and Its Uniqueness

3.1.4 Mutual Information

3.1.5 Information Measures Recap

3.2 Codon Discovery from Mutual Information Anomaly

3.3 ORF Discovery from Long‐Tail Distribution Anomaly

3.3.1 Ab initio Learning with smORF’s, Holistic Modeling, and Bootstrap Learning

3.4 Sequential Processes and Markov Models

3.4.1 Markov Chains

3.5 Exercises

4 Ad Hoc, Ab Initio, and Bootstrap Signal Acquisition Methods

4.1 Signal Acquisition, or Scanning, at Linear Order Time‐Complexity

4.2 Genome Analytics: The Gene‐Finder

4.3 Objective Performance Evaluation: Sensitivity and Specificity

4.4 Signal Analytics: The Time‐Domain Finite State Automaton (tFSA)

4.4.1 tFSA Spike Detector

4.4.2 tFSA‐Based Channel Signal Acquisition Methods with Stable Baseline

4.4.3 tFSA‐Based Channel Signal Acquisition Methods Without Stable Baseline

4.5 Signal Statistics (Fast): Mean, Variance, and Boxcar Filter

4.5.1 Efficient Implementations for Statistical Tools (O(L))

4.6 Signal Spectrum: Nyquist Criterion, Gabor Limit, Power Spectrum

4.6.1 Nyquist Sampling Theorem

4.6.2 Fourier Transforms, and Other Classic Transforms

4.6.3 Power Spectral Density

4.6.4 Power‐Spectrum‐Based Feature Extraction

4.6.5 Cross‐Power Spectral Density

4.6.6 AM/FM/PM Communications Protocol

4.7 Exercises

5 Text Analytics

5.1 Words

5.1.1 Text Acquisition: Text Scraping and Associative Memory. 5.1.1.1 Kiwix, Gutenberg Project, Wikipedia

5.1.1.2 Library of Babel

5.1.1.3 Weather Scraper

5.1.1.4 Stock Scraper – New‐Style with Cookies

5.1.2 Word Frequency Analysis: Machiavelli’s Polysemy on Fortuna and Virtu

5.1.3 Word Frequency Analysis: Coleridge’s Hidden Polysemy on Logos

5.1.4 Sentiment Analysis

5.2 Phrases – Short (Three Words)

5.2.1 Shakespearean Insult Generation – Phrase Generation

5.3 Phrases – Long (A Line or Sentence)

5.3.1 Iambic Phrase Analysis: Shakespeare

5.3.2 Natural Language Processing

5.3.3 Sentence and Story Generation: Tarot

5.4 Exercises

6 Analysis of Sequential Data Using HMMs

6.1 Hidden Markov Models (HMMs) 6.1.1 Background and Role in Stochastic Sequential Analysis (SSA)

6.1.2 When to Use a Hidden Markov Model (HMM)?

6.1.3 Hidden Markov Models (HMMs) – Standard Formulation and Terms

6.2 Graphical Models for Markov Models and Hidden Markov Models

6.2.1 Hidden Markov Models

6.2.2 Viterbi Path

6.2.2.1 The Most Probable State Sequence

6.2.3 Forward and Backward Probabilities

6.2.4 HMM: Maximum Likelihood discrimination

6.2.5 Expectation/Maximization (Baum–Welch)

6.2.5.1 Emission and Transition Expectations with Rescaling

6.3 Standard HMM Weaknesses and their GHMM Fixes

6.4 Generalized HMMs (GHMMs – “Gems”): Minor Viterbi Variants. 6.4.1 The Generic HMM

6.4.2 pMM/SVM

6.4.3 EM and Feature Extraction via EVA Projection

6.4.4 Feature Extraction via Data Absorption (a.k.a. Emission Inversion)

6.4.5 Modified AdaBoost for Feature Selection and Data Fusion

The AdaBoost algorithm

6.4.5.1 The Modified Adaboost Algorithm for Feature Selection

6.4.5.2 Modified Adaboost in SSA Protocol

6.5 HMM Implementation for Viterbi (in C and Perl)

6.6 Exercises

7 Generalized HMMs (GHMMs): Major Viterbi Variants

7.1 GHMMs: Maximal Clique for Viterbi and Baum–Welch

7.2 GHMMs: Full Duration Model. 7.2.1 HMM with Duration (HMMD)

7.2.2 Hidden Semi‐Markov Models (HSMM) with sid‐information

7.2.3 HMM with Binned Duration (HMMBD)

7.3 GHMMs: Linear Memory Baum–Welch Algorithm

7.4 GHMMs: Distributable Viterbi and Baum–Welch Algorithms. 7.4.1 Distributed HMM processing via “Viterbi‐overlap‐chunking” with GPU speedup

7.4.2 Relative Entropy and Viterbi Scoring

7.5 Martingales and the Feasibility of Statistical Learning (further details in Appendix)

7.6 Exercises

8 Neuromanifolds and the Uniqueness of Relative Entropy. 8.1 Overview

8.2 Review of Differential Geometry [206, 207]

8.2.1 Differential Topology – Natural Manifold

8.2.2 Differential Geometry – Natural Geometric Structures

8.3 Amari’s Dually Flat Formulation [113–115]

8.3.1 Generalization of Pythagorean Theorem

8.3.2 Projection Theorem and Relation Between Divergence and Link Formalism

8.4 Neuromanifolds [113–115]

8.5 Exercises

9 Neural Net Learning and Loss Bounds Analysis

9.1 Brief Introduction to Neural Nets (NNs)

9.1.1 Single Neuron Discriminator

9.1.1.1 The Perceptron

9.1.1.2 Sigmoid Neurons

9.1.1.3 The Loss Function and Gradient Descent

9.1.2 Neural Net with Back‐Propagation

9.1.2.1 The Loss Function – General Activation in a General Neural Net

9.2 Variational Learning Formalism and Use in Loss Bounds Analysis

9.2.1 Variational Basis for Update Rule

9.2.2 Review and Generalization of GD Loss Bounds Analysis [213, 214]

9.2.3 Review of the EG Loss Bounds Analysis

9.3 The “sinh−1(ω)” link algorithm (SA) 9.3.1 Motivation for “sinh−1(ω)” link algorithm (SA)

9.3.2 Relation of sinh Link Algorithm to the Binary Exponentiated Gradient Algorithm

9.4 The Loss Bounds Analysis for sinh−1(ω)

9.4.1 Loss Bounds Analysis Using the Taylor Series Approach

9.4.2 Loss Bounds Analysis Using Taylor Series for the sinh Link (SA) Algorithm

9.5 Exercises

10 Classification and Clustering

10.1 The SVM Classifier – An Overview

10.2 Introduction to Classification and Clustering

10.2.1 Sum of Squared Error (SSE) Scoring

10.2.2 K‐Means Clustering (Unsupervised Learning)

10.2.3 k‐Nearest Neighbors Classification (Supervised Learning)

10.2.4 The Perceptron Recap (See Chapter 9 for Details)

10.3 Lagrangian Optimization and Structural Risk Minimization (SRM) 10.3.1 Decision Boundary and SRM Construction Using Lagrangian

10.3.2 The Theory of Classification

10.3.3 The Mathematics of the Feasibility of Learning

10.3.3.1 The Hoeffding Inequality

10.3.3.2 Hoeffding Inequality is Related to Chebyshev Inequality

10.3.3.3 Sample Error

10.3.3.4 The Generalization Bound (Establishes First ML Law for |H| < ∞)

10.3.3.5 The VC Generalization Bound (Establishes First ML Law for |H| = ∞)

The VC Dimension and Generalization

VC Generalization Bound

10.3.4 Lagrangian Optimization

10.3.5 The Support Vector Machine (SVM) – Lagrangian with SRM

10.3.5.1 Kernel Modeling and Other Tuning

10.3.6 Kernel Construction Using Polarization

10.3.7 SVM Binary Classifier Derivation

10.4 SVM Binary Classifier Implementation

10.4.1 Sequential Minimal Optimization (SMO)

10.4.2 Alpha‐Selection Variants

10.4.3 Chunking on Large Datasets: O(N2) ➔ n O(N2/n2) = O(N2)/n

10.4.3.1 Distributed SVM Processing (Chunking)

10.4.3.2 SV/Non‐SV Pass‐Tuning on Train Subsets: An Outlier‐Management Heuristic

10.4.3.3 SV/Non‐SV Pass‐Tuning on (9AT,9TA) vs. (9CG,9GC)

10.4.4 Support Vector Reduction (SVR)

10.4.4.1 Multi‐Threaded Chunking with SVR

10.4.4.2 Multi‐Threaded Distributed Chunking with SVR

10.4.5 Code Examples (in OO Perl)

10.5 Kernel Selection and Tuning Metaheuristics. 10.5.1 The “Stability” Kernels

10.5.1.1 The Mercer Test

10.5.1.2 The Positive Principal Minors Test

10.5.2 Derivation of “Stability” Kernels

10.5.3 Entropic and Gaussian Kernels Relate to Unique, Minimally Structured, Information Divergence and Geometric Distance Measures

10.5.4 Automated Kernel Selection and Tuning

10.6 SVM Multiclass from Decision Tree with SVM Binary Classifiers

10.7 SVM Multiclass Classifier Derivation (Multiple Decision Surface)

10.7.1 Decomposition Method to Solve the Dual

10.7.2 SVM Speedup via Differentiating BSVs and SVs

10.8 SVM Clustering

10.8.1 SVM‐External Clustering

10.8.1.1 Single‐Convergence Initialized SVM‐Clustering: Exploration on Sensitivity to Tuning

10.8.1.2 Single‐Convergence SVM‐Clustering: Hybrid Clustering

10.8.2 Single‐Convergence SVM‐Clustering: Comparative Analysis

10.8.2.1 SVM “Internal” Clustering

10.8.2.2 Solving the Dual (Based on Keerthi’s SMO [184] )

10.8.2.3 Keerthi Algorithm

10.8.3 Stabilized, Single‐Convergence Initialized, SVM‐External Clustering

10.8.4 Stabilized, Multiple‐Convergence, SVM‐External Clustering

10.8.5 SVM‐External Clustering – Algorithmic Variants. 10.8.5.1 Multiple‐Convergence Initialized (Steepest Ascent) SVM‐Clustering

10.8.5.2 Projection Clustering – Clustering in Decision Space

10.8.5.3 SVM‐ABC

10.8.5.4 SVM‐Relabeler

10.8.5.5 SV‐Dropper

10.8.5.6 Rayleigh’s Criterion Clustering Algorithm

10.9 Exercises

11 Search Metaheuristics

11.1 Trajectory‐Based Search Metaheuristics

11.1.1 Optimal‐Fitness Configuration Trajectories – Fitness Function Known and Sufficiently Regular

11.1.1.1 Metaheuristic #1: Euler’s Method – First‐Order Gradient Ascent

11.1.1.2 Metaheuristic #2: Newton’s Method – Second‐Order Gradient Ascent

11.1.1.3 Metaheuristic #3: Gradient Ascent with (Random) Restart

11.1.2 Optimal‐Fitness Configuration Trajectories – Fitness Function not Known

11.1.2.1 Metaheuristic #4: (Blind) Hill Climbing

11.1.2.2 Metaheuristic #5: Steepest Ascent Hill Climbing

11.1.2.3 Metaheuristic #6: Steepest Ascent Hill Climbing with Restart

11.1.3 Fitness Configuration Trajectories with Nonoptimal Updates

Global Optimization

11.1.3.1 Metaheuristic #7: Simulated Annealing Hill Climbing

11.1.3.2 Metaheuristic #8: Simulated Annealing Hill Climbing with Random Restart

11.1.3.3 Metaheuristic #9: Taboo Search

11.1.3.4 Metaheuristic #10: Tabu Search with Restart

11.1.3.5 Metaheuristic #11: Component‐Based Tabu Search

11.1.3.6 Metaheuristic #12: Component‐Based Tabu Search with Restart

11.2 Population‐Based Search Metaheuristics

11.2.1 Population with Evolution

11.2.1.1 Metaheuristic #13: Evolutionary Optimization (Darwinian Evolution; Asexual Reproduction)

11.2.1.2 Metaheuristic #14: Genetic Algorithm (Darwinian Evolution; Sexual Reproduction – Binary Interaction)

11.2.1.3 Evolutionary Algorithm Parameters

11.2.2 Population with Group Interaction – Swarm Intelligence

11.2.2.1 Metaheuristic #15: Particle Swarm Optimization (PSO) (Lamarckian Evolution)

11.2.3 Population with Indirect Interaction via Artifact

11.2.3.1 Metaheuristic #16: Ant Colony Optimization (ACO) (Swarm Intelligence; Stygmergy; Have Coevolution with Artifact)

11.2.3.2 Other Population‐Based Search Metaheuristics

11.3 Exercises

12 Stochastic Sequential Analysis (SSA)

12.1 HMM and FSA‐Based Methods for Signal Acquisition and Feature Extraction

12.2 The Stochastic Sequential Analysis (SSA) Protocol

12.2.1 (Stage 1) Primitive Feature Identification

12.2.2 (Stage 2) Feature Identification and Feature Selection

12.2.2.1 Stochastic Carrier Wave Encoding/Decoding

12.2.3 (Stage 3) Classification

12.2.4 (Stage 4) Clustering

12.2.5 (All Stages) Database/Data‐Warehouse System Specification

12.2.6 (All Stages) Server‐Based Data Analysis System Specification

12.3 Channel Current Cheminformatics (CCC) Implementation of the Stochastic Sequential Analysis (SSA) Protocol

12.4 SCW for Detector Sensitivity Boosting

12.4.1 NTD with Multiple Channels (or High Noise)

12.4.2 Stochastic Carrier Wave

12.5 SSA for Deep Learning

12.6 Exercises

13 Deep Learning Tools – TensorFlow

13.1 Neural Nets Review

13.1.1 Summary of Single Neuron Discriminator

13.1.2 Summary of Neural Net Discriminator and Back‐Propagation

13.2 TensorFlow from Google

13.2.1 Installation/Setup

13.2.1.1 Tensors

13.2.2 Example: Character Recognition

13.2.2.1 Transfer Learning

13.2.2.2 Fine‐tuning

13.2.3 Example: Language Translation

13.2.4 TensorBoard and the TensorFlow Profiler

13.2.5 Tensor Cores

13.3 Exercises

14 Nanopore Detection – A Case Study

14.1 Standard Apparatus

14.1.1 Standard Operational and Physiological Buffer Conditions

14.1.2 α‐Hemolysin Channel Stability – Introduction of Chaotropes

14.2 Controlling Nanopore Noise Sources and Choice of Aperture

14.3 Length Resolution of Individual DNA Hairpins

14.4 Detection of Single Nucleotide Differences (Large Changes in Structure)

14.5 Blockade Mechanism for 9bphp

14.6 Conformational Kinetics on Model Biomolecules

14.7 Channel Current Cheminformatics. 14.7.1 Power Spectra and Standard EE Signal Analysis

14.7.2 Channel Current Cheminformatics for Single‐Biomolecule/Mixture Identifications

14.7.3 Channel Current Cheminformatics: Feature Extraction by HMM

14.7.4 Bandwidth Limitations

14.8 Channel‐Based Detection Mechanisms. 14.8.1 Partitioning and Translocation‐Based ND Biosensing Methods

14.8.2 Transduction Versus Translation

14.8.3 Single‐Molecule Versus Ensemble

14.8.4 Biosensing with High Sensitivity in Presence of Interference

14.8.5 Nanopore Transduction Detection Methods

14.8.5.1 Things to “Contact” with the Channel: Aptamers

14.8.5.2 Things to “Contact” with the Channel: Immunoglobulins

14.9 The NTD Nanoscope

14.9.1 Nanopore Transduction Detection (NTD)

14.9.1.1 Ponderable Media Flow Phenomenology and Related Information Flow Phenomenology

14.9.2 NTD: A Versatile Platform for Biosensing

14.9.3 NTD Platform

14.9.4 NTD Operation

14.9.5 Driven Modulations

14.9.6 Driven Modulations with Multichannel Augmentation

14.10 NTD Biosensing Methods

14.10.1 Model Biosensor Based on Streptavidin and Biotin

14.10.2 Model System Based on DNA Annealing. 14.10.2.1 Linear DNA Annealing Test

14.10.2.2 “Y” DNA Annealing Test

14.10.3 Y‐Aptamer with Use of Chaotropes to Improve Signal Resolution

14.10.4 Pathogen Detection, miRNA Detection, and miRNA Haplotyping

14.10.5 SNP Detection

14.10.6 Aptamer‐Based Detection

14.10.6.1 NaDir SELEX

14.10.7 Antibody‐Based Detection

14.10.7.1 Managing Antibodies as Easily Identifiable Interference or Transducer

14.10.7.2 Small Target Antibody‐Based Detection (Linked Modulator)

14.10.7.3 Large Target Antibody‐Based Detection

14.11 Exercises

Appendix A Python and Perl System Programming in Linux. A.1 Getting Linux and Python in a Flash (Drive)

A.2 Linux and the Command Shell

A.3 Perl Review: I/O, Primitives, String Handling, Regex

Appendix B Physics. B.1 The Calculus of Variations

Appendix C Math. C.1 Martingales [102] Martingale Definition

Induced Martingales with Markov Chains [102]

In HMM Learning Have Sequences of Likelihood Ratios, Which Is a Martingale, Proof

Supermartingales and Submartingales [102]

Martingale Convergence Theorems [102]

Theorem

“Maximal” Inequalities for Martingales [102]

Lemma 1

Lemma 2

Mean‐Square Convergence Theorem for Martingales [102]

Martingales w.r.t σ‐Field Formalism

Backwards Martingale Definition (w.r.t Sigma Sub‐fields)

Backwards Martingale Convergence Theorem

Strong Law of Large Numbers Proof

Stationary Processes

Strong Ergodic Theorem [102]

Asymptotic Equipartition Property (AEP)

De Finetti’s Theorem

C.2 Hoeffding Inequality

Hoeffding Lemma Proof

Hoeffding Inequality Proof (for Further Details, See [104] )

Chernoff Bounding Technique:

References

Index. a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

q

r

s

t

u

v

w

y

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Stephen Winters‐Hilt

.....

In standard band‐limited (and not time‐limited) signal analysis with periodic waveforms, sampling is done at the Nyquist rate to have a fully reproducible signal. If the sample information is needed elsewhere, it is then compressed (possibly lossy) and transmitted (a “smart encoder”). The received data is then decompressed and reconstructed (by simply summing wave components, e.g. a “simple” decoder). If the signal is sparse or compressible, then compressive sensing [190] can be used, where sampling and compression are combined into one efficient step to obtain compressive measurements (the simple encoding in [190] since a set of random projections are employed), which are then transmitted (general details on noise in this context are described in [191, 192]). On the receiving end, the decompression and reconstruction steps are, likewise, combined using an asymmetric “smart” decoding step. This progression toward asymmetric compressive signal processing can be taken a step further if we consider signal sequences to be equivalent if they have the same stationary statistics. What is obtained is a method similar to compressive sensing, but involving stationary‐statistics generative‐projection sensing, where the signal processing is non‐lossy at the level of stationary statistics equivalence. In the SCW signal analysis the signal source is generative in that it is describable via use of a HMM, and the HMM’s Viterbi‐derived generative projections are used to describe the sparse components contributing to the signal source. In SCW encoding the modulation of stationary statistics can be man‐made or natural, with the latter in many experimental situations involving a flow phenomenology that has stationary statistics. If the signal is man‐made, usually the underlying stochastic process is still a natural source, where it is the changes in the stationary statistics that is under the control of the man‐made encoding scheme. Transmission and reception are then followed by generative projection via Viterbi‐HMM template matching or via Viterbi‐HMM feature extraction followed by separate classification (using SVM). So in the SCW approach the encoding is even simpler (possibly non‐existent, other than directly passing quantized signal) and is applicable to any noise source with stationary statistics (e.g. a stationary signal with reproducible statistics, the case for many experimental observations). The decoding must be even “smarter,” on the other hand, in that generalized Viterbi algorithms are used, and possibly other ML methods as well, SVMs in particular. An example of the stationary statistics sensing with a ML‐based decoder is described in application to CCC studies in Chapter 14.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Informatics and Machine Learning
Подняться наверх