Bioinformatics
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Группа авторов. Bioinformatics
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Bioinformatics
Foreword
Preface
Contributors
About the Companion Website
1 Biological Sequence Databases
Introduction
Nucleotide Sequence Databases
Nucleotide Sequence Flatfiles: A Dissection
The Header
Box 1.1 Functional Divisions in Nucleotide Databases
The Feature Table
Graphical Interfaces
RefSeq
Box 1.2 RefSeq
Protein Sequence Databases
The NCBI Protein Database
UniProt
Summary
Box 1.3 Ensuring the Continued Quality of Data in Public Sequence Databases
Acknowledgments
Internet Resources
Further Reading
References
2 Information Retrieval from Biological Databases
Introduction
Integrated Information Retrieval: The Entrez System
Relationships Between Database Entries: Neighboring
Hard Links
The Entrez Discovery Pathway
Medical Databases
Organismal Sequence Databases Beyond NCBI
Summary
Internet Resources
Further Reading
References
3 Assessing Pairwise Sequence Similarity: BLAST and FASTA
Introduction
Global Versus Local Sequence Alignments
Scoring Matrices
Box 3.1 Scoring Matrices and the Log Odds Ratio
PAM Matrices
BLOSUM Matrices
Which Matrices Should be Used When?
Nucleotide Scoring Matrices
Gaps and Gap Penalties
BLAST
The Algorithm
Box 3.2 The Karlin–Altschul Equation
Performing a BLAST Search
Understanding the BLAST Output
Suggested BLAST Cut-Offs
BLAST 2 Sequences
MegaBLAST
PSI-BLAST
The Method
Performing a PSI-BLAST Search
BLAT
FASTA
The Method
Running a FASTA Search
Statistical Significance of Results
Comparing FASTA and BLAST
Summary
Internet Resources
Further Reading
References
4 Genome Browsers
Introduction
The UCSC Genome Browser
Box 4.1 Common File Types for Genomic Data
Box 4.2 GENCODE
Box 4.3 Histone Marks
UCSC Table Browser
ENSEMBL Genome Browser
Box 4.4 Ensembl Stable IDs
Ensembl Biomart
JBrowse
Summary
Internet Resources
Further Reading
References
5 Genome Annotation
Introduction
Gene Prediction Methods
Ab Initio Gene Prediction in Prokaryotic Genomes
Box 5.1 Position-Specific Scoring Matrices
Box 5.2 Markov Models
Box 5.3 Hidden Markov Models in Gene Prediction
Ab Initio Gene Prediction in Eukaryotic Genomes
Predicting Exon-Defining Signals
Predicting and Scoring Exons
Exon Assembly
How Well Do Gene Predictors Work?
Box 5.4 Evaluating Binary Classifications or Predictions in Bioinformatics
Assessing Prokaryotic Gene Predictors
Assessing Eukaryotic Gene Predictors
Evidence Generation for Genome Annotation
Gene Annotation and Evidence Generation Using RNA-seq Data
Gene Annotation and Evidence Generation Using Protein Sequence Databases
Gene Annotation and Evidence Generation using Comparative Gene Prediction
Evidence Generation for Non-Protein-Coding, Non-Coding, or Foreign Genes
tRNA and rRNA Gene Finding
Prophage Finding in Prokaryotes
Repetitive Sequence Finding/Masking in Eukaryotes
Finding and Removing Pseudogenes in Eukaryotes
Genome Annotation Pipelines
Prokaryotic Genome Annotation Pipelines
Eukaryotic Genome Annotation Pipelines
Visualization and Quality Control
Summary
Acknowledgments
Internet Resources
Further Reading
References
6 Predictive Methods Using RNA Sequences
Introduction
Overview of RNA Secondary Structure Prediction Using Thermodynamics
Box 6.1 Gibbs Free Energy
Dynamic Programming
Box 6.2 Algorithm Complexity
Accuracy of RNA Secondary Structure Prediction
Experimental Methods to Refine Secondary Structure Prediction
Predicting the Secondary Structure Common to Multiple RNA Sequences
Algorithms That Are Constrained by an Initial Alignment
Algorithms That Are Not Constrained by the Initial Alignment
Practical Introduction to Single-Sequence Methods
Using the Mfold Web Server
Using the RNAstructure Web Server
Practical Introduction to Multiple Sequence Methods. Using the RNAstructure Web Server to Predict a Common Structure for Multiple Sequences
Other Computational Methods to Study RNA Structure
Comparison of Methods
Predicting RNA Tertiary Structure
Summary
Internet Resources
Further Reading
References
7 Predictive Methods Using Protein Sequences
Introduction
One-Dimensional Prediction of Protein Structure. Synopsis
Secondary Structure and Solvent Accessibility
Box 7.1 Hidden Markov Models
Box 7.2 Neural Networks
Performance Assessment of Secondary Structure Prediction
Box 7.3 Secondary Structure Prediction Scoring Schemes and Receiver Operating Characteristic Curves
Transmembrane Alpha Helices and Beta Strands
Box 7.4 Scoring Schemes for Structural Protein Segments
Disordered Regions
Predicting Protein Function
Synopsis
Motifs and Domains
Databases
Gene Function Prediction Based on the Gene Ontology
Subcellular Localization
Protein Interaction Sites
Effect of Sequence Variants
Summary
Internet Resources
Further Reading
References
8 Multiple Sequence Alignments
Introduction
Measuring Multiple Alignment Quality
Making an Alignment: Practical Issues
Commonly Used Alignment Packages
Clustal Omega
ClustalW2
DIALIGN
Kalign
MAFFT
MUSCLE
PASTA
PRANK
T-Coffee
Viewing a Multiple Alignment
Clustal X
Jalview
SeaView
ProViz
Summary
Internet Resources
References
9 Molecular Evolution and Phylogenetic Analysis
Introduction
Early Classification Schemes
Sequences As Molecular Clocks
Background Terminology and the Basics
How to Construct a Tree
Multiple Sequence Alignment and Alignment Editing
Determining the Substitution Model
Tree Building
Tree Visualization
Marker-Based Evolution Studies
Phylogenetic Analysis and Data Integration
Box 9.1 Predicting Cancer Progression and Drug Response Using Phylogenetic Approaches
Future Challenges
Internet Resources
References
10 Expression Analysis
Introduction
Step 0: Choose an Expression Analysis Technology
DNA Microarrays
RNA-seq
The Choice is Yours
Step 1: Design the Experiment
Step 2: Collect and Manage the Data – and Metadata
Step 3: Data Pre-Processing
Step 4: Quality Control
Quality Control Tools
Screening for Misidentified Samples: PCA on Y Chromosome Expression
Step 5: Normalization and Batch Effects. The Importance of Normalizing and Batch-Correcting Data
FPKM and Count Data
Sample and Quantile Normalization
Additional Methods of Sample Normalization
Batch Correction
Step 6: Exploratory Data Analysis
Hierarchical Clustering
Principal Component Analysis
Non-Negative Matrix Factorization
Step 7: Differential Expression Analysis
Student's t-Test: The Father of Them All
Limma
Voom
Negative Binomial Models
Fold-Change
Correcting for Multiple Testing
Step 8: Exploring Mechanisms Through Functional Enrichment Analysis
List-Based Methods
Rank-Based Methods
Step 9: Developing a Classifier
Measuring Classifier Performance
Feature Selection
Classification Methods
Validation of Predictive Models
Single-Cell Sequencing
Summary
Internet Resources
Further Reading
References
11 Proteomics and Protein Identification by Mass Spectrometry
Introduction. What Is a Proteome?
Why Study Proteomes?
Mass Spectrometry
Ionization
Mass Analyzers
Box 11.1 Tandem Mass Spectrometry (Figure 11.4)
Ion Detectors
Box 11.2 The Mass Spectrum (Figure 11.5)
Tandem Mass Spectrometry for Peptide Identification
Sample Preparation
Box 11.3 Post-Translational Modification (Figure 11.7)
Bioinformatics Analysis for MS-based Proteomics
Proteomics Strategies
Box 11.4 Quantitative Proteomics (Figure 11.10)
Peptide Mass Fingerprinting
PMF on the Web. Mascot
Proteomics and Tandem MS
Peptide Spectral Matching
De Novo Peptide Sequencing
Spectral Library Searching
Hybrid Search
Top-Down (Intact Protein) MS
Database Search Models
PSM Software
SEQUEST
X! Tandem
MaxQuant (Andromeda)
PSM on the Web
Reporting Standards
Proteomics XML Formats
Proteomics Data Repositories
ProteomeXchange
PRIDE
PeptideAtlas
Global Proteome Machine + GPMdb
Protein/Proteomics Databases
UniProt
PTM Databases
Selected Applications of Proteomics
Differential Proteomics
Functional Proteomics
Structural Proteomics
Summary
Acknowledgments
Internet Resources
Further Reading
References
12 Protein Structure Prediction and Analysis
Introduction to Protein Structures
How Protein Structures are Determined
Box 12.1 The Meaning of RMSD
How Protein Structures are Described
Box 12.2 PDB Format
Protein Structure Databases
Other Structure Databases
MMDB
Proteopedia
Visualizing Proteins
Protein Structure Prediction
Homology Modeling
Threading
Ab Initio Structure Prediction
Protein Structure Evaluation
Protein Structure Comparison
Summary
Internet Resources
Further Reading
References
13 Biological Networks and Pathways
Introduction
Pathway and Molecular Interaction Mapping: Experiments and Predictions
Pathway and Molecular Interaction Databases: An Overview
Representing Biological Pathways and Interaction Networks in a Computer
Considerations for Pathway and Interaction Data Representation
Pathway Databases. Reactome
EcoCyc
KEGG
Molecular Interaction Databases. BioGRID
IntAct
Functional Interaction Databases
STRING
GeneMANIA
Strategies for Navigating Pathway and Interaction Databases
Standard Data Formats for Pathways and Molecular Interactions
BioPAX
PSI-MI
SBML
Pathway Visualization and Analysis
Network Visualization and Analysis
Box 13.1 Advanced Graph Theory Applications
Network Visualization
Network Analysis
Summary
Acknowledgments
Internet Resources
Further Reading
References
14 Metabolomics
Introduction
Data Formats
Chemical Representation and Exchange Formats
Spectral Representation and Exchange Formats
Molecular Editors
Spectral Viewers
Databases
Chemical Compound Databases
Spectral Databases
Metabolic Pathway Databases
Organism-Specific Metabolomic Databases
Bioinformatics for Metabolite Identification
Box 14.1 Targeted Versus Untargeted Metabolomics
Levels of Metabolite Identification
NMR-Based Compound Identification
GC-MS-Based Compound Identification
LC-MS-Based Compound Identification
Multivariate Statistics
Principal Component Analysis
Partial Least Squares Discriminant Analysis
Bioinformatics for Metabolite Interpretation
Summary
Internet Resources
Further Reading
References
15 Population Genetics
Introduction
Evolutionary Processes and Genetic Variation
Allele Frequencies and Population Variation
Box 15.1 Basic Definitions and Concepts
Display Methods
Demographic History Inference
Box 15.2 Inferring Demographic History
Admixture and Ancestry Estimation
Detection of Natural Selection
Other Applications
Summary
Internet Resources
References
16 Metagenomics and Microbial Community Analysis
Introduction
Why Study the Microbiome?
The Origins of Microbiome Analysis
Metagenomic Workflow
General Considerations in Marker-Gene and Metagenomic Data Analysis
Marker Genes
Box 16.1 Ribosomal RNA Genes
Quality Control
Grouping of Similar Sequences
Taxonomic Assignment
Box 16.2 Diversity at Different Taxonomic Ranks
Calculating and Comparing Diversity
Associations with Metadata
Metagenomic Data Analysis
Predicting Functional Information from Marker-Gene Data
Metagenomic Analysis Protocol
Quality Control and Merging of Paired-End Reads
Assembly
Gene Annotation and Homology Searching
Taxonomic Assignment and Profiling
Functional Predictions
Statistical Associations
Other Techniques to Characterize the Microbiome
Summary
Internet Resources
Further Reading
References
17 Translational Bioinformatics
Introduction
Databases Describing the Genetics of Human Health
Prediction and Characterization of Impactful Genetic Variants from Sequence
Box 17.1 Gene Testing of Hereditary Cancers
Characterizing Genetic Variants at the Protein Level
Characterizing Genetic Variants at the Genomic or Transcriptomic Level
Using Informatics to Prioritize Disease-Causing Genes
Translating Model Organism Data to Humans
Computing with Patient Phenotype Using Data in Electronic Health Records. Introduction to Electronic Health Records
Structured Clinical Data with Biomedical Ontologies
Common Data Models
Much of Electronic Health Record Data are Plaintext
Informatics and Precision Medicine. Describing Patient Phenotype
Box 17.2 Associating Clinical Phenotypes to Variants: The PheWAS Approach
Drug Repurposing
Clinical Marker Development from -omics Data
Box 17.3 The Markers of Aging
Integration of Heterogeneous Data Sources
Precision Medicine Initiatives
Community Challenges Solve Innovative Problems Collaboratively
Box 17.4 The CAGI Personal Genome Project Community Challenge
Electronic Health Record Systems can be Customized
Informatics for Prevention Policy
Ethical, Legal, and Social Implications of Translational Medicine
Protecting Patient Privacy
Summary
Internet Resources
References
18 Statistical Methods for Biologists
Introduction
Descriptive Representations of Data. Data vs. Information vs. Knowledge
Datasets and Data Schemas
Descriptive Statistics
The Right Graph Is the Most Descriptive Representation of a Dataset
Frequency and Probability Distributions
Statistical Inference and Statistical Hypothesis Testing. Statistical Inference
Statistical Hypothesis Testing
Type I and II Errors that Arise from Statistical Hypothesis Testing
Statistical Significance
Testing the Null Hypothesis with a Two-Sample t-Test
Statistical Power
Correcting for False Discovery due to Multiple Testing
The Global Problem with the Use of p Values
Common Statistical Tests Used in a Typical Statistical Inference Process
Summary
Acknowledgments
Internet Resources
Further Reading
References
Appendices. 1.1 Example of a Flatfile Header in ENA Format
1.2 Example of a Flatfile Header in DDBJ/GenBank Format
1.3 Example of a Feature Table in ENA Format
1.4 Example of a Feature Table in GenBank/DDBJ Format
6.1 Dynamic Programming
Reference
Glossary
Index
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Edited by
Andreas D. Baxevanis, Gary D. Bader, and David S. Wishart
.....
Figure 1.1 The landing page for ENA record U54469.1, providing a graphical view of biological features found within the sequence of the Drosophila melanogaster eukaryotic initiation factor 4E (eIF4E) gene. The tracks within the graphical view show the position of the gene, mRNAs, and coding regions (marked CDS) within the 2881 bp sequence reported in this record.
.....