Bioinformatics

Bioinformatics
Автор книги: id книги: 1887585     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 13106,2 руб.     (142,73$) Читать книгу Купить и скачать книгу Купить бумажную книгу Электронная книга Жанр: Биология Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119335955 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Praise for the third edition of Bioinformatics “This book is a gem to read and use in practice.” — Briefings in Bioinformatics "This volume has a distinctive, special value as it offers an unrivalled level of details and unique expert insights from the leading computational biologists, including the very creators of popular bioinformatics tools." — ChemBioChem “A valuable survey of this fascinating field. . . I found it to be the most useful book on bioinformatics that I have seen and recommend it very highly.” — American Society for Microbiology News “This should be on the bookshelf of every molecular biologist.” — The Quarterly Review of Biology The field of bioinformatics is advancing at a remarkable rate. With the development of new analytical techniques that make use of the latest advances in machine learning and data science, today’s biologists are gaining fantastic new insights into the natural world’s most complex systems. These rapidly progressing innovations can, however, be difficult to keep pace with. The expanded fourth edition of the best-selling Bioinformatics aims to remedy this by providing students and professionals alike with a comprehensive survey of the current field. Revised to reflect recent advances in computational biology, it offers practical instruction on the gathering, analysis, and interpretation of data, as well as explanations of the most powerful algorithms presently used for biological discovery. Bioinformatics, Fourth Edition offers the most readable, up-to-date, and thorough introduction to the field for biologists at all levels, covering both key concepts that have stood the test of time and the new and important developments driving this fast-moving discipline forwards. This new edition features:  New chapters on metabolomics, population genetics, metagenomics and microbial community analysis, and translational bioinformatics A thorough treatment of statistical methods as applied to biological data Special topic boxes and appendices highlighting experimental strategies and advanced concepts Annotated reference lists, comprehensive lists of relevant web resources, and an extensive glossary of commonly used terms in bioinformatics, genomics, and proteomics Bioinformatics is an indispensable companion for researchers, instructors, and students of all levels in molecular biology and computational biology, as well as investigators involved in genomics, clinical research, proteomics, and related fields.

Оглавление

Группа авторов. Bioinformatics

Table of Contents

List of Tables

List of Illustrations

Guide

Pages

Bioinformatics

Foreword

Preface

Contributors

About the Companion Website

1 Biological Sequence Databases

Introduction

Nucleotide Sequence Databases

Nucleotide Sequence Flatfiles: A Dissection

The Header

Box 1.1 Functional Divisions in Nucleotide Databases

The Feature Table

Graphical Interfaces

RefSeq

Box 1.2 RefSeq

Protein Sequence Databases

The NCBI Protein Database

UniProt

Summary

Box 1.3 Ensuring the Continued Quality of Data in Public Sequence Databases

Acknowledgments

Internet Resources

Further Reading

References

2 Information Retrieval from Biological Databases

Introduction

Integrated Information Retrieval: The Entrez System

Relationships Between Database Entries: Neighboring

Hard Links

The Entrez Discovery Pathway

Medical Databases

Organismal Sequence Databases Beyond NCBI

Summary

Internet Resources

Further Reading

References

3 Assessing Pairwise Sequence Similarity: BLAST and FASTA

Introduction

Global Versus Local Sequence Alignments

Scoring Matrices

Box 3.1 Scoring Matrices and the Log Odds Ratio

PAM Matrices

BLOSUM Matrices

Which Matrices Should be Used When?

Nucleotide Scoring Matrices

Gaps and Gap Penalties

BLAST

The Algorithm

Box 3.2 The Karlin–Altschul Equation

Performing a BLAST Search

Understanding the BLAST Output

Suggested BLAST Cut-Offs

BLAST 2 Sequences

MegaBLAST

PSI-BLAST

The Method

Performing a PSI-BLAST Search

BLAT

FASTA

The Method

Running a FASTA Search

Statistical Significance of Results

Comparing FASTA and BLAST

Summary

Internet Resources

Further Reading

References

4 Genome Browsers

Introduction

The UCSC Genome Browser

Box 4.1 Common File Types for Genomic Data

Box 4.2 GENCODE

Box 4.3 Histone Marks

UCSC Table Browser

ENSEMBL Genome Browser

Box 4.4 Ensembl Stable IDs

Ensembl Biomart

JBrowse

Summary

Internet Resources

Further Reading

References

5 Genome Annotation

Introduction

Gene Prediction Methods

Ab Initio Gene Prediction in Prokaryotic Genomes

Box 5.1 Position-Specific Scoring Matrices

Box 5.2 Markov Models

Box 5.3 Hidden Markov Models in Gene Prediction

Ab Initio Gene Prediction in Eukaryotic Genomes

Predicting Exon-Defining Signals

Predicting and Scoring Exons

Exon Assembly

How Well Do Gene Predictors Work?

Box 5.4 Evaluating Binary Classifications or Predictions in Bioinformatics

Assessing Prokaryotic Gene Predictors

Assessing Eukaryotic Gene Predictors

Evidence Generation for Genome Annotation

Gene Annotation and Evidence Generation Using RNA-seq Data

Gene Annotation and Evidence Generation Using Protein Sequence Databases

Gene Annotation and Evidence Generation using Comparative Gene Prediction

Evidence Generation for Non-Protein-Coding, Non-Coding, or Foreign Genes

tRNA and rRNA Gene Finding

Prophage Finding in Prokaryotes

Repetitive Sequence Finding/Masking in Eukaryotes

Finding and Removing Pseudogenes in Eukaryotes

Genome Annotation Pipelines

Prokaryotic Genome Annotation Pipelines

Eukaryotic Genome Annotation Pipelines

Visualization and Quality Control

Summary

Acknowledgments

Internet Resources

Further Reading

References

6 Predictive Methods Using RNA Sequences

Introduction

Overview of RNA Secondary Structure Prediction Using Thermodynamics

Box 6.1 Gibbs Free Energy

Dynamic Programming

Box 6.2 Algorithm Complexity

Accuracy of RNA Secondary Structure Prediction

Experimental Methods to Refine Secondary Structure Prediction

Predicting the Secondary Structure Common to Multiple RNA Sequences

Algorithms That Are Constrained by an Initial Alignment

Algorithms That Are Not Constrained by the Initial Alignment

Practical Introduction to Single-Sequence Methods

Using the Mfold Web Server

Using the RNAstructure Web Server

Practical Introduction to Multiple Sequence Methods. Using the RNAstructure Web Server to Predict a Common Structure for Multiple Sequences

Other Computational Methods to Study RNA Structure

Comparison of Methods

Predicting RNA Tertiary Structure

Summary

Internet Resources

Further Reading

References

7 Predictive Methods Using Protein Sequences

Introduction

One-Dimensional Prediction of Protein Structure. Synopsis

Secondary Structure and Solvent Accessibility

Box 7.1 Hidden Markov Models

Box 7.2 Neural Networks

Performance Assessment of Secondary Structure Prediction

Box 7.3 Secondary Structure Prediction Scoring Schemes and Receiver Operating Characteristic Curves

Transmembrane Alpha Helices and Beta Strands

Box 7.4 Scoring Schemes for Structural Protein Segments

Disordered Regions

Predicting Protein Function

Synopsis

Motifs and Domains

Databases

Gene Function Prediction Based on the Gene Ontology

Subcellular Localization

Protein Interaction Sites

Effect of Sequence Variants

Summary

Internet Resources

Further Reading

References

8 Multiple Sequence Alignments

Introduction

Measuring Multiple Alignment Quality

Making an Alignment: Practical Issues

Commonly Used Alignment Packages

Clustal Omega

ClustalW2

DIALIGN

Kalign

MAFFT

MUSCLE

PASTA

PRANK

T-Coffee

Viewing a Multiple Alignment

Clustal X

Jalview

SeaView

ProViz

Summary

Internet Resources

References

9 Molecular Evolution and Phylogenetic Analysis

Introduction

Early Classification Schemes

Sequences As Molecular Clocks

Background Terminology and the Basics

How to Construct a Tree

Multiple Sequence Alignment and Alignment Editing

Determining the Substitution Model

Tree Building

Tree Visualization

Marker-Based Evolution Studies

Phylogenetic Analysis and Data Integration

Box 9.1 Predicting Cancer Progression and Drug Response Using Phylogenetic Approaches

Future Challenges

Internet Resources

References

10 Expression Analysis

Introduction

Step 0: Choose an Expression Analysis Technology

DNA Microarrays

RNA-seq

The Choice is Yours

Step 1: Design the Experiment

Step 2: Collect and Manage the Data – and Metadata

Step 3: Data Pre-Processing

Step 4: Quality Control

Quality Control Tools

Screening for Misidentified Samples: PCA on Y Chromosome Expression

Step 5: Normalization and Batch Effects. The Importance of Normalizing and Batch-Correcting Data

FPKM and Count Data

Sample and Quantile Normalization

Additional Methods of Sample Normalization

Batch Correction

Step 6: Exploratory Data Analysis

Hierarchical Clustering

Principal Component Analysis

Non-Negative Matrix Factorization

Step 7: Differential Expression Analysis

Student's t-Test: The Father of Them All

Limma

Voom

Negative Binomial Models

Fold-Change

Correcting for Multiple Testing

Step 8: Exploring Mechanisms Through Functional Enrichment Analysis

List-Based Methods

Rank-Based Methods

Step 9: Developing a Classifier

Measuring Classifier Performance

Feature Selection

Classification Methods

Validation of Predictive Models

Single-Cell Sequencing

Summary

Internet Resources

Further Reading

References

11 Proteomics and Protein Identification by Mass Spectrometry

Introduction. What Is a Proteome?

Why Study Proteomes?

Mass Spectrometry

Ionization

Mass Analyzers

Box 11.1 Tandem Mass Spectrometry (Figure 11.4)

Ion Detectors

Box 11.2 The Mass Spectrum (Figure 11.5)

Tandem Mass Spectrometry for Peptide Identification

Sample Preparation

Box 11.3 Post-Translational Modification (Figure 11.7)

Bioinformatics Analysis for MS-based Proteomics

Proteomics Strategies

Box 11.4 Quantitative Proteomics (Figure 11.10)

Peptide Mass Fingerprinting

PMF on the Web. Mascot

Proteomics and Tandem MS

Peptide Spectral Matching

De Novo Peptide Sequencing

Spectral Library Searching

Hybrid Search

Top-Down (Intact Protein) MS

Database Search Models

PSM Software

SEQUEST

X! Tandem

MaxQuant (Andromeda)

PSM on the Web

Reporting Standards

Proteomics XML Formats

Proteomics Data Repositories

ProteomeXchange

PRIDE

PeptideAtlas

Global Proteome Machine + GPMdb

Protein/Proteomics Databases

UniProt

PTM Databases

Selected Applications of Proteomics

Differential Proteomics

Functional Proteomics

Structural Proteomics

Summary

Acknowledgments

Internet Resources

Further Reading

References

12 Protein Structure Prediction and Analysis

Introduction to Protein Structures

How Protein Structures are Determined

Box 12.1 The Meaning of RMSD

How Protein Structures are Described

Box 12.2 PDB Format

Protein Structure Databases

Other Structure Databases

MMDB

Proteopedia

Visualizing Proteins

Protein Structure Prediction

Homology Modeling

Threading

Ab Initio Structure Prediction

Protein Structure Evaluation

Protein Structure Comparison

Summary

Internet Resources

Further Reading

References

13 Biological Networks and Pathways

Introduction

Pathway and Molecular Interaction Mapping: Experiments and Predictions

Pathway and Molecular Interaction Databases: An Overview

Representing Biological Pathways and Interaction Networks in a Computer

Considerations for Pathway and Interaction Data Representation

Pathway Databases. Reactome

EcoCyc

KEGG

Molecular Interaction Databases. BioGRID

IntAct

Functional Interaction Databases

STRING

GeneMANIA

Strategies for Navigating Pathway and Interaction Databases

Standard Data Formats for Pathways and Molecular Interactions

BioPAX

PSI-MI

SBML

Pathway Visualization and Analysis

Network Visualization and Analysis

Box 13.1 Advanced Graph Theory Applications

Network Visualization

Network Analysis

Summary

Acknowledgments

Internet Resources

Further Reading

References

14 Metabolomics

Introduction

Data Formats

Chemical Representation and Exchange Formats

Spectral Representation and Exchange Formats

Molecular Editors

Spectral Viewers

Databases

Chemical Compound Databases

Spectral Databases

Metabolic Pathway Databases

Organism-Specific Metabolomic Databases

Bioinformatics for Metabolite Identification

Box 14.1 Targeted Versus Untargeted Metabolomics

Levels of Metabolite Identification

NMR-Based Compound Identification

GC-MS-Based Compound Identification

LC-MS-Based Compound Identification

Multivariate Statistics

Principal Component Analysis

Partial Least Squares Discriminant Analysis

Bioinformatics for Metabolite Interpretation

Summary

Internet Resources

Further Reading

References

15 Population Genetics

Introduction

Evolutionary Processes and Genetic Variation

Allele Frequencies and Population Variation

Box 15.1 Basic Definitions and Concepts

Display Methods

Demographic History Inference

Box 15.2 Inferring Demographic History

Admixture and Ancestry Estimation

Detection of Natural Selection

Other Applications

Summary

Internet Resources

References

16 Metagenomics and Microbial Community Analysis

Introduction

Why Study the Microbiome?

The Origins of Microbiome Analysis

Metagenomic Workflow

General Considerations in Marker-Gene and Metagenomic Data Analysis

Marker Genes

Box 16.1 Ribosomal RNA Genes

Quality Control

Grouping of Similar Sequences

Taxonomic Assignment

Box 16.2 Diversity at Different Taxonomic Ranks

Calculating and Comparing Diversity

Associations with Metadata

Metagenomic Data Analysis

Predicting Functional Information from Marker-Gene Data

Metagenomic Analysis Protocol

Quality Control and Merging of Paired-End Reads

Assembly

Gene Annotation and Homology Searching

Taxonomic Assignment and Profiling

Functional Predictions

Statistical Associations

Other Techniques to Characterize the Microbiome

Summary

Internet Resources

Further Reading

References

17 Translational Bioinformatics

Introduction

Databases Describing the Genetics of Human Health

Prediction and Characterization of Impactful Genetic Variants from Sequence

Box 17.1 Gene Testing of Hereditary Cancers

Characterizing Genetic Variants at the Protein Level

Characterizing Genetic Variants at the Genomic or Transcriptomic Level

Using Informatics to Prioritize Disease-Causing Genes

Translating Model Organism Data to Humans

Computing with Patient Phenotype Using Data in Electronic Health Records. Introduction to Electronic Health Records

Structured Clinical Data with Biomedical Ontologies

Common Data Models

Much of Electronic Health Record Data are Plaintext

Informatics and Precision Medicine. Describing Patient Phenotype

Box 17.2 Associating Clinical Phenotypes to Variants: The PheWAS Approach

Drug Repurposing

Clinical Marker Development from -omics Data

Box 17.3 The Markers of Aging

Integration of Heterogeneous Data Sources

Precision Medicine Initiatives

Community Challenges Solve Innovative Problems Collaboratively

Box 17.4 The CAGI Personal Genome Project Community Challenge

Electronic Health Record Systems can be Customized

Informatics for Prevention Policy

Ethical, Legal, and Social Implications of Translational Medicine

Protecting Patient Privacy

Summary

Internet Resources

References

18 Statistical Methods for Biologists

Introduction

Descriptive Representations of Data. Data vs. Information vs. Knowledge

Datasets and Data Schemas

Descriptive Statistics

The Right Graph Is the Most Descriptive Representation of a Dataset

Frequency and Probability Distributions

Statistical Inference and Statistical Hypothesis Testing. Statistical Inference

Statistical Hypothesis Testing

Type I and II Errors that Arise from Statistical Hypothesis Testing

Statistical Significance

Testing the Null Hypothesis with a Two-Sample t-Test

Statistical Power

Correcting for False Discovery due to Multiple Testing

The Global Problem with the Use of p Values

Common Statistical Tests Used in a Typical Statistical Inference Process

Summary

Acknowledgments

Internet Resources

Further Reading

References

Appendices. 1.1 Example of a Flatfile Header in ENA Format

1.2 Example of a Flatfile Header in DDBJ/GenBank Format

1.3 Example of a Feature Table in ENA Format

1.4 Example of a Feature Table in GenBank/DDBJ Format

6.1 Dynamic Programming

Reference

Glossary

Index

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Edited by

Andreas D. Baxevanis, Gary D. Bader, and David S. Wishart

.....

Figure 1.1 The landing page for ENA record U54469.1, providing a graphical view of biological features found within the sequence of the Drosophila melanogaster eukaryotic initiation factor 4E (eIF4E) gene. The tracks within the graphical view show the position of the gene, mRNAs, and coding regions (marked CDS) within the 2881 bp sequence reported in this record.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Bioinformatics
Подняться наверх