Algorithms in Bioinformatics
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Paul A. Gagniuc. Algorithms in Bioinformatics
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Algorithms in Bioinformatics. Theory and Implementation
Preface
About the Companion Website
1 The Tree of Life (I) 1.1 Introduction
1.2 Emergence of Life
1.2.1 Timeline Disagreements
1.3 Classifications and Mechanisms
1.4 Chromatin Structure
1.5 Molecular Mechanisms
1.5.1 Precursor Messenger RNA
1.5.2 Precursor Messenger RNA to Messenger RNA
1.5.3 Classes of Introns
1.5.4 Messenger RNA
1.5.5 mRNA to Proteins
1.5.6 Transfer RNA
1.5.7 Small RNA
1.5.8 The Transcriptome
1.5.9 Gene Networks and Information Processing
1.5.10 Eukaryotic vs. Prokaryotic Regulation
1.5.11 What Is Life?
1.6 Known Species
1.7 Approaches for Compartmentalization
1.7.1 Two Main Approaches for Organism Formation
1.7.2 Size and Metabolism
1.8 Sizes in Eukaryotes
1.8.1 Sizes in Unicellular Eukaryotes
1.8.2 Sizes in Multicellular Eukaryotes
1.9 Sizes in Prokaryotes
1.10 Virus Sizes
1.10.1 Viruses vs. the Spark of Metabolism
1.11 The Diffusion Coefficient
1.12 The Origins of Eukaryotic Cells
1.12.1 Endosymbiosis Theory
1.12.2 DNA and Organelles
1.12.3 Membrane-bound Organelles with DNA
1.12.4 Membrane-bound Organelles Without DNA
1.12.5 Control and Division of Organelles
1.12.6 The Horizontal Gene Transfer
1.12.7 On the Mechanisms of Horizontal Gene Transfer
1.13 Origins of Eukaryotic Multicellularity
1.13.1 Colonies Inside an Early Unicellular Common Ancestor
1.13.2 Colonies of Early Unicellular Common Ancestors
1.13.3 Colonies of Inseparable Early Unicellular Common Ancestors
1.13.4 Chimerism and Mosaicism
1.14 Conclusions
2 Tree of Life: Genomes (II) 2.1 Introduction
2.2 Rules of Engagement
2.3 Genome Sizes in the Tree of Life
2.3.1 Alternative Methods
2.3.2 The Weaving of Scales
Additional algorithm 2.1 Note that the source code is in context and works with copy/paste
Additional algorithm 2.2 Note that the source code is in context and works with copy/paste
2.3.3 Computations on the Average Genome Size
2.3.4 Observations on Data
2.4 Organellar Genomes
2.4.1 Chloroplasts
2.4.2 Apicoplasts
2.4.3 Chromatophores
2.4.4 Cyanelles
2.4.5 Kinetoplasts
2.4.6 Mitochondria
2.5 Plasmids
2.6 Virus Genomes
2.7 Viroids and Their Implications
2.8 Genes vs. Proteins in the Tree of Life
2.9 Conclusions
3 Sequence Alignment (I) 3.1 Introduction
3.2 Style and Visualization
Additional algorithm 3.1 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 3.2 Note that the source code is out of context and is intended for explanation of the method
3.3 Initialization of the Score Matrix
Additional algorithm 3.3 Note that the source code is in context and works with copy/paste
3.4 Calculation of Scores
3.4.1 Initialization of the Score Matrix for Global Alignment
Additional algorithm 3.4 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 3.5 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 3.6 Note that the source code is in context and works with copy/paste
3.4.2 Initialization of the Score Matrix for Local Alignment
Additional algorithm 3.7 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 3.8 Note that the source code is in context and works with copy/paste
3.4.3 Optimization of the Initialization Steps
Additional algorithm 3.9 Note that the source code is out of context and is intended for explanation of the method
3.4.4 Curiosities
Additional algorithm 3.10 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 3.11 Note that the source code is in context and works with copy/paste
3.5 Traceback
Additional algorithm 3.12 Note that the source code is out of context and is intended for explanation of the method
3.6 Global Alignment
Additional algorithm 3.13 Note that the source code is in context and works with copy/paste
3.7 Local Alignment
Additional algorithm 3.14 Note that the source code is in context and works with copy/paste
3.8 Alignment Layout
Additional algorithm 3.15 Note that the source code is out of context and is intended for explanation of the method
3.9 Local Sequence Alignment – The Final Version
Additional algorithm 3.16 Note that the source code is in context and works with copy/paste
3.10 Complementarity
Additional algorithm 3.17 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 3.18 Note that the source code is in context and works with copy/paste
3.11 Conclusions
4 Forced Alignment (II) 4.1 Introduction
4.2 Global and Local Sequence Alignment
4.2.1 Short Notes
4.2.2 Understanding the Technology
4.2.3 Main Objectives
4.3 Experiments and Discussions
4.3.1 Alignment Layout
4.3.2 Forced Alignment Regime
4.3.3 Alignment Scores and Significance
4.3.4 Optimal Alignments
4.3.5 The Main Significance Scores
4.3.6 The Information Content
4.3.7 The Match Percentage
4.3.8 Significance vs. Chance
4.3.9 The Importance of Randomness
4.3.10 Sequence Quality and the Score Matrix
4.3.11 The Significance Threshold
4.3.12 Optimal Alignments by Numbers
4.3.13 Chaos Theory on Sequence Alignment
4.3.14 Image-Encoding Possibilities
4.4 Advanced Features and Methods
4.4.1 Sequence Detector
4.4.2 Parameters
4.4.3 Heatmap
Heatmap Area
Fractional Parts and Discretization
Top-Left Sequences
Heatmap Charts
Information Window
4.4.4 Text Visualization
4.4.5 Graphics for Manuscript Figures and Didactic Presentations
4.4.6 Dynamics
4.4.7 Independence
4.4.8 Limits
4.4.9 Local Storage
Web Storage API
Format and Limitations
Record Navigation
Local Storage Expansion Capabilities
Import and Export
Disk Operations
4.5 Conclusions
5 Self-Sequence Alignment (I) 5.1 Introduction
5.2 True Randomness
5.3 Information and Compression Algorithms
5.4 White Noise and Biological Sequences
5.5 The Mathematical Model
5.5.1 A Concrete Example
5.5.2 Model Dissection
5.5.3 Conditions for Maxima and Minima
5.6 Noise vs. Redundancy
5.7 Global and Local Information Content
5.8 Signal Sensitivity
5.9 Implementation
5.9.1 Global Self-Sequence Alignment
Additional algorithm 5.1 Note that the source code is in context and works with copy/paste
Additional algorithm 5.2 Note that the source code is in context and works with copy/paste
5.9.2 Local Self-Sequence Alignment
Additional algorithm 5.3 Note that the source code is in context and works with copy/paste
5.10 A Complete Scanner for Information Content
Additional algorithm 5.4 Note that the source code is in context and works with copy/paste
5.11 Conclusions
6 Frequencies and Percentages (II) 6.1 Introduction
6.2 Base Composition
6.3 Percentage of Nucleotide Combinations
6.4 Implementation
Additional algorithm 6.1 Note that the source code is in context and works with copy/paste
Additional algorithm 6.2 Note that the source code is in context and works with copy/paste
6.5 A Frequency Scanner
Additional algorithm 6.3 Note that the source code is in context and works with copy/paste
6.6 Examples of Known Significance
6.7 Observation vs. Expectation
6.8 A Frequency Scanner with a Threshold
Additional algorithm 6.4 Note that the source code is in context and works with copy/paste
6.9 Conclusions
7 Objective Digital Stains (III) 7.1 Introduction
7.2 Information and Frequency
Additional algorithm 7.1 Note that the source code is in context and works with copy/paste
7.3 The Objective Digital Stain
Additional algorithm 7.2 Note that the source code is in context and works with copy/paste
7.3.1 A 3D Representation Over a 2D Plane
Additional algorithm 7.3 Note that the source code is in context and works with copy/paste
7.3.2 ODSs Relative to the Background
Additional algorithm 7.4 Note that the source code is in context and works with copy/paste
7.4 Interpretation of ODSs
7.5 The Significance of the Areas in the ODS
7.6 Discussions
7.6.1 A Similarity Between Dissimilar Sequences
7.7 Conclusions
8 Detection of Motifs (I) 8.1 Introduction
8.2 DNA Motifs
8.2.1 DNA-binding Proteins vs. Motifs and Degeneracy
8.2.2 Concrete Examples of DNA Motifs
8.3 Major Functions of DNA Motifs
8.3.1 RNA Splicing and DNA Motifs
8.4 Conclusions
9 Representation of Motifs (II) 9.1 Introduction
9.2 The Training Data
9.3 A Visualization Function
Additional algorithm 9.1 Note that the source code is out of context and is intended for explanation of the method
9.4 The Alignment Matrix
Additional algorithm 9.2 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 9.3 Note that the source code is out of context and is intended for explanation of the method
9.5 Alphabet Detection
Additional algorithm 9.4 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 9.5 Note that the source code is out of context and is intended for explanation of the method
9.6 The Position-Specific Scoring Matrix (PSSM) Initialization
Additional algorithm 9.6 Note that the source code is out of context and is intended for explanation of the method
9.7 The Position Frequency Matrix (PFM)
Additional algorithm 9.7 Note that the source code is out of context and is intended for explanation of the method
9.8 The Position Probability Matrix (PPM)
Additional algorithm 9.8 Note that the source code is out of context and is intended for explanation of the method
9.8.1 A Kind of PPM Pseudo-Scanner
Additional algorithm 9.9 Note that the source code is out of context and is intended for explanation of the method
9.9 The Position Weight Matrix (PWM)
Additional algorithm 9.10 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 9.11 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 9.12 Note that the source code is out of context and is intended for explanation of the method
9.10 The Background Model
9.11 The Consensus Sequence
Additional algorithm 9.13 Note that the source code is out of context and is intended for explanation of the method
9.11.1 The Consensus – Not Necessarily Functional
9.12 Mutational Intolerance
9.13 From Motifs to PWMs
Additional algorithm 9.14 Note that the source code is in context and works with copy/paste
9.14 Pseudo-Counts and Negative Infinity
Additional algorithm 9.15 Note that the source code is in context and works with copy/paste
9.15 Conclusions
10 The Motif Scanner (III) 10.1 Introduction
10.2 Looking for Signals
Additional algorithm 10.1 Note that the source code is out of context and is intended for explanation of the method
10.3 A Functional Scanner
Additional algorithm 10.2 Note that the source code is in context and works with copy/paste
10.4 The Meaning of Scores
10.4.1 A Score Value Above Zero
10.4.2 A Score Value Below Zero
10.4.3 A Score Value of Zero
10.5 Conclusions
11 Understanding the Parameters (IV) 11.1 Introduction
11.2 Experimentation
11.2.1 A Scanner Implementation Based on Pseudo-Counts
Additional algorithm 11.1 Note that the source code is in context and works with copy/paste
11.2.2 A Scanner Implementation Based on Propagation of Zero Counts
Additional algorithm 11.2 Note that the source code is in context and works with copy/paste
11.3 Signal Discrimination
11.4 False-Positive Results
11.5 Sensitivity Adjustments
11.6 Beyond Bioinformatics
11.7 A Scanner That Uses a Known PWM
Additional algorithm 11.3 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 11.4 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 11.5 Note that the source code is in context and works with copy/paste
11.8 Signal Thresholds
Additional algorithm 11.6 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 11.7 Note that the source code is out of context and is intended for explanation of the method
11.8.1 Implementation and Filter Testing
Additional algorithm 11.8 Note that the source code is in context and works with copy/paste
11.9 Conclusions
12 Dynamic Backgrounds (V) 12.1 Introduction
12.2 Toward a Scanner with Two PFMs
12.2.1 The Implementation of Dynamic PWMs
Additional algorithm 12.1 Note that the source code is in context and works with copy/paste
12.2.2 Issues and Corrections for Dynamic PWMs
12.2.3 Solutions for Aberrant Positive Likelihood Values
Virtual Additions to the Background Set
Real Additions to the Background Set
Verification of the Two Methods
Additional algorithm 12.2 Note that the source code is out of context and is intended for explanation of the method
12.3 A Scanner with Two PFMs
Additional algorithm 12.3 Note that the source code is in context and works with copy/paste
12.4 Information and Background Frequencies on Score Values
12.5 Dynamic Background vs. Null Model
12.6 Conclusions
13 Markov Chains: The Machine (I) 13.1 Introduction
13.2 Transition Matrices
13.3 Discrete Probability Detector
13.3.1 Alphabet Detection
Additional algorithm 13.1 Note that the source code is out of context and is intended for explanation of the method
13.3.2 Matrix Initialization
Additional algorithm 13.2 Note that the source code is out of context and is intended for explanation of the method
13.3.3 Frequency Detection
Additional algorithm 13.3 Note that the source code is out of context and is intended for explanation of the method
13.3.4 Calculation of Transition Probabilities
Additional algorithm 13.4 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 13.5 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 13.6 Note that the source code is in context and works with copy/paste
Additional algorithm 13.7 Note that the source code is in context and works with copy/paste
13.3.5 Particularities in Calculating the Transition Probabilities
13.4 Markov Chains Generators
13.4.1 The Experiment
13.4.2 The Implementation
Additional algorithm 13.8 Note that the source code is in context and works with copy/paste
13.4.3 Simulation of Transition Probabilities
13.4.4 The Markov machine
13.4.5 Result Verification
13.5 Conclusions
14 Markov Chains: Log Likelihood (II) 14.1 Introduction
14.2 The Log-Likelihood Matrix
14.2.1 A Log-Likelihood Matrix Based on the Null Model
Additional algorithm 14.1 Note that the source code is out of context and is intended for explanation of the method
Additional algorithm 14.2 Note that the source code is out of context and is intended for explanation of the method
14.2.2 A Log-Likelihood Matrix Based on Two Models
Additional algorithm 14.3 Note that the source code is out of context and is intended for explanation of the method
14.3 Interpretation and Use of the Log-Likelihood Matrix
Additional algorithm 14.4 Note that the source code is out of context and is intended for explanation of the method
14.4 Construction of a Markov Scanner
Additional algorithm 14.5 Note that the source code is in context and works with copy/paste
Additional algorithm 14.6 Note that the source code is out of context and is intended for explanation of the method
14.5 A Scanner That Uses a Known LLM
Additional algorithm 14.7 Note that the source code is in context and works with copy/paste
14.6 The Meaning of Scores
14.7 Beyond Bioinformatics
14.8 Conclusions
15 Spectral Forecast (I) 15.1 Introduction
15.2 The Spectral Forecast Model
15.3 The Spectral Forecast Equation
15.4 The Spectral Forecast Inner Workings
15.4.1 Each Part on a Single Matrix
15.4.2 Both Parts on a Single Matrix
15.4.3 Both Parts on Separate Matrices
15.4.4 Concrete Example 1
15.4.5 Concrete Example 2
15.4.6 Concrete Example 3
15.5 Implementations
15.5.1 Spectral Forecast for Signals
Additional algorithm 15.1 Note that the source code is in context and works with copy/paste
15.5.2 What Does the Value of d Mean?
Additional algorithm 15.2 Note that the source code is in context and works with copy/paste
15.5.3 Spectral Forecast for Matrices
Additional algorithm 15.3 Note that the source code is in context and works with copy/paste
15.6 The Spectral Forecast Model for Predictions
15.6.1 The Spectral Forecast Model for Signals
Additional algorithm 15.4 Note that the source code is in context and works with copy/paste
Additional algorithm 15.5 Note that the source code is in context and works with copy/paste
15.6.2 Experiments on the Similarity Index Values
15.6.3 The Spectral Forecast Model for Matrices
Additional algorithm 15.6 Note that the source code is in context and works with copy/paste
15.7 Conclusions
16 Entropy vs. Content (I) 16.1 Introduction
16.2 Information Entropy
16.3 Implementation
Additional algorithm 16.1 Note that the source code is in context and works with copy/paste
Additional algorithm 16.2 Note that the source code is in context and works with copy/paste
16.4 Information Content vs. Information Entropy
16.4.1 Implementation
Additional algorithm 16.3 Note that the source code is in context and works with copy/paste
Additional algorithm 16.4 Note that the source code is in context and works with copy/paste
16.4.2 Additional Considerations
16.5 Conclusions
17 Philosophical Transactions. 17.1 Introduction
17.2 The Frame of Reference
17.2.1 The Fundamental Layer of Complexity
17.2.2 On the Complexity of Life
17.3 Random vs. Pseudo-random
Additional algorithm 17.1 Note that the source code is in context and works with copy/paste
Additional algorithm 17.2 Note that the source code is in context and works with copy/paste
17.4 Random Numbers and Noise
17.5 Determinism and Chaos
17.5.1 Chaos Without Noise
Additional algorithm 17.3 Note that the source code is in context and works with copy/paste
Additional algorithm 17.4 Note that the source code is in context and works with copy/paste
17.5.2 Chaos with Noise
Additional algorithm 17.5 Note that the source code is in context and works with copy/paste
17.5.3 Limits of Prediction
17.5.4 On the Wings of Chaos
17.6 Free Will and Determinism
17.6.1 The Greatest Disappointment
17.6.2 The Most Powerful Processor in Existence
17.6.3 Certainty vs. Interpretation
17.6.4 A Wisdom that Applies
17.7 Conclusions
Appendix A
A.1 Association of Numerical Values with Letters
Additional algorithm A.1 Note that the source code is in context and works with copy/paste
A.2 Sorting Values on Columns
Additional algorithm A.2 Note that the source code is in context and works with copy/paste
A.3 The Implementation of a Sequence Logo
Additional algorithm A.3 Note that the source code is in context and works with copy/paste
A.4 Sequence Logos Based on Maximum Values
Additional algorithm A.4 Note that the source code is in context and works with copy/paste
A.5 Using Logarithms to Build Sequence Logos
Additional algorithm A.5 Note that the source code is in context and works with copy/paste
A.6 From a Motif Set to a Sequence Logo
Additional algorithm A.6 Note that the source code is in context and works with copy/paste
References
Index
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Paul A. Gagniuc
University Politehnica of Bucharest
.....
Life shows two main approaches for organism formation. The first approach forced a cooperation between biochemical processes (virtual cells in one physical boundary). The second approach forced a cooperation for gradual specialization among individual cells of the same species (or even between species). Interestingly, the second approach shows a lower entropy than the first. However, cooperation between individual cells did not rule out further specialization between biochemical processes inside individual cells.
Competition and gravity preclude the emergence of unicellular organisms over a certain size. Moreover, gradient-based biochemical signaling and interactions would be inefficient on long distances inside large unicellular organisms. Multicellular organisms seem to have found a balance between the speed of response and the size of the cells. Small cells have a larger surface area relative to their volume. Each unit of volume can exchange gases and nutrients at a higher rate compared to larger cells. Note that the principle is equivalent to smaller salt granules that dissolve faster in water than large ones. Cooperation for development of cell specialization in the direction of a circulatory system formation ensured an optimal exchange with the outside environment and a fast response for the entire organism. In the case of very large unicellular organisms, the response time for any stimulus may be dictated by distances inside the cell and the metabolic rate. For instance, a biochemical interaction between two points in the cytoplasm of such an organism would require time and high amounts of messenger molecules to diffuse in a large volume until the target is stochastically encountered. In other words, “time contracts” for giant unicellular organisms. It is likely that giant single-celled organisms have existed in the distant past. However, competition with smaller unicellular organisms with higher response times may have eliminated them from the evolutionary chain.
.....