Data Analytics in Bioinformatics

Data Analytics in Bioinformatics
Автор книги: id книги: 1907565     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 22632,5 руб.     (246,6$) Читать книгу Купить и скачать книгу Купить бумажную книгу Электронная книга Жанр: Программы Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119785606 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Machine learning techniques are increasingly being used to address problems in computational biology and bioinformatics. Novel machine learning computational techniques to analyze high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. Machine learning techniques such as Markov models, support vector machines, neural networks, and graphical models have been successful in analyzing life science data because of their capabilities in handling randomness and uncertainty of data noise and in generalization. Machine Learning in Bioinformatics compiles recent approaches in machine learning methods and their applications in addressing contemporary problems in bioinformatics approximating classification and prediction of disease, feature selection, dimensionality reduction, gene selection and classification of microarray data and many more.

Оглавление

Группа авторов. Data Analytics in Bioinformatics

Table of Contents

List of Illustrations

List of Tables

Guide

Pages

Data Analytics in Bioinformatics. A Machine Learning Perspective

Preface

Acknowledgement

1. Introduction to Supervised Learning

1.1 Introduction

1.2 Learning Process & its Methodologies

1.2.1 Supervised Learning

1.2.2 Unsupervised Learning

1.2.3 Reinforcement Learning

1.3 Classification and its Types

1.4 Regression

1.4.1 Logistic Regression

1.4.2 Difference between Linear & Logistic Regression

1.5 Random Forest

1.6 K-Nearest Neighbor

1.7 Decision Trees

1.8 Support Vector Machines

1.9 Neural Networks

1.10 Comparison of Numerical Interpretation

1.11 Conclusion & Future Scope

References

2. Introduction to Unsupervised Learning in Bioinformatics

2.1 Introduction

2.2 Clustering in Unsupervised Learning

2.3 Clustering in Bioinformatics—Genetic Data

2.3.1 Microarray Analysis

2.3.2 Clustering Algorithms

2.3.3 Partition Algorithms. 2.3.3.1 k-Means Clustering

2.3.3.2 Cluster Center Initialization Algorithm (CCIA)

2.3.3.3 Intelligent Kernel k-Mean (IKKM)

2.3.3.4 Clustering Large Applications (CLARA)

2.3.4 Hierarchical Clustering Algorithms

2.3.4.1 AGNES (Agglomerative Nesting)

2.3.4.2 DIANA (Divisive Analysis)

2.3.4.3 CURE (Clustering Using Representatives)

2.3.4.4 CHAMELEON

2.3.4.5 BRICH (Balanced Iterative Reducing and Clustering Using Hierarchies)

2.3.5 Density-Based Approach

2.3.5.1 DBSCAN

2.3.6 Model-Based Approach

2.3.6.1 SOM (Self-Organizing Maps)

2.3.7 Grid-Based Clustering

2.3.7.1 STING (Statistical Information Grid-Based Algorithm)

2.3.8 Soft Clustering

2.3.8.1 FCM (Fuzzy Class Membership)

2.4 Conclusion

References

3. A Critical Review on the Application of Artificial Neural Network in Bioinformatics

3.1 Introduction

3.1.1 Different Areas of Application of Bioinformatics

3.1.2 Bioinformatics in Real World

3.1.3 Issues with Bioinformatics

3.1.3.1 Issues Related to Structure

3.1.3.2 Sequence Analysis

3.2 Biological Datasets

3.3 Building Computational Model

3.3.1 Data Pre-Processing and its Necessity

3.3.2 Biological Data Classification

3.3.3 ML in Bioinformatics

3.3.4 Introduction to ANN

3.3.5 Application of ANN in Bioinformatics

3.3.6 Broadly Used Supervised Machine Learning Techniques

3.4 Literature Review

3.4.1 Comparative Analysis of ANN With Broadly Used Traditional ML Algorithms

3.5 Critical Analysis

3.6 Conclusion

References

4. Dimensionality Reduction Techniques: Principles, Benefits, and Limitations

4.1 Introduction

4.2 The Benefits and Limitations of Dimension Reduction Methods

4.3 Components of Dimension Reduction

4.3.1 Feature Selection

4.3.2 Feature Reduction

4.4 Methods of Dimensionality Reduction

4.4.1 Principal Component Analysis (PCA)

4.4.2 Missing Values Ratio (MVR)

4.4.3 Linear Discriminant Analysis (LDA)

4.4.4 Backward Feature Elimination (BFE)

4.4.5 Forward Feature Construction (FFC)

4.4.6 Independent Component Analysis (ICA)

4.4.7 Low Variance Filter (LVF)

4.4.8 High Correlation Filter

4.4.9 Random Forests (RF)/Ensemble Trees

4.4.10 t-Distributed Stochastic Neighbor Embedding (t-SNE)

4.4.11 Autoencoder

4.4.12 Factor Analysis (FA)

4.4.13 Uniform Manifold Approximation and Projection (UMAP)

4.4.14 Information Gain (IG)

4.4.15 Vector Quantization (VQ)

4.5 Conclusion

References

5. Plant Disease Detection Using Machine Learning Tools With an Overview on Dimensionality Reduction

5.1 Introduction

5.2 Flowchart

5.3 Machine Learning (ML) in Rapid Stress Phenotyping

5.4 Dimensionality Reduction

5.4.1 Feature Extraction

5.4.1.1 PCA (Principal Component Analysis)

5.4.1.2 LDA (Linear Discriminant Analysis)

5.4.1.3 SIFT (Scale Invariant Feature Transform)

5.4.1.4 SURF (Speeded Up Robust Features)

5.4.1.5 ORB (Oriented FAST and Rotated BRIEF)

5.5 Literature Survey

5.6 Types of Plant Stress

5.6.1 Biotic Stress

5.6.1.1 Fungal Pathogen

5.6.1.2 Bacterial Pathogen

5.7 Implementation I: Numerical Dataset. 5.7.1 Dataset Description

5.7.2 Results

5.7.3 Discussion

5.8 Implementation II: Image Dataset. 5.8.1 Dataset Description

5.8.2 Method Used

5.8.3 Results. 5.8.3.1 Results of ORB Feature Extraction and Brute Force Matching

5.8.3.1.3 Bacterial Leaf Blight on Rice Leaves

5.8.3.2 Color Histogram Comparison: Using Correlation Method

5.8.4 Discussions

5.9 Conclusion

References

6. Gene Selection Using Integrative Analysis of Multi-Level Omics Data: A Systematic Review

6.1 Introduction

6.2 Approaches for Gene Selection

6.3 Multi-Level Omics Data Integration

6.3.1 Horizontal Integration

6.3.2 Vertical Integration

6.4 Machine Learning Approaches for Multi-Level Data Integration

6.4.1 Unsupervised Integration of Omics Data

6.4.2 Supervised Integration of Omics Data

6.5 Critical Observation

6.6 Conclusion

References

7. Random Forest Algorithm in Imbalance Genomics Classification

7.1 Introduction

7.2 Methodological Issues. 7.2.1 Decision Tree (DT) Classifier

7.2.2 Ensemble Techniques

7.2.3 Mathematical Formulation of Ensemble Technique

7.2.4 Bagging

7.2.5 Bagging Pseudocode

7.2.6 Random Forest

7.3 Biological Terminologies. 7.3.1 DNA

7.3.2 Genomics

7.3.3 Proteins

7.4 Proposed Model

7.4.1 Balancing the Data

Algorithm 1: Clustering based on similarity measure

7.4.2 Ensembling of Trees

Algorithm 2: Ensembling of DTs

7.5 Experimental Analysis

7.6 Current and Future Scope of ML in Genomics

7.6.1 Gene Sequencing

7.6.2 Services to Consumer

7.6.3 Gene Editing

7.6.4 Pharmacy Genomics

7.6.5 Newborn Genetic Screening

7.7 Conclusion

References

8. Feature Selection and Random Forest Classification for Breast Cancer Disease

8.1 Introduction

8.2 Literature Survey

8.3 Machine Learning

8.4 Feature Engineering

8.5 Methodology. 8.5.1 Dataset Collection

8.5.2 Proposed Work

8.5.2.1 Selection of Feature by Means of Correlation and Accuracy Calculation Using Random Forest Classification

8.5.2.2 Feature Selection Using one Variety and Accuracy Calculation Using Random Forest Classification

8.5.2.3 Feature Elimination Using RFE and Classification Using Random Forest

8.6 Result Analysis

8.7 Conclusion

References

9. A Comprehensive Study on the Application of Grey Wolf Optimization for Microarray Data

9.1 Introduction

9.2 Microarray Data

9.3 Grey Wolf Optimization (GWO) Algorithm

9.3.1 Principle of GWO

9.3.2 Mathematical Model of GWO

9.3.2.1 The Encircling

9.3.2.2 Hunting

9.3.2.3 Attacking Prey: (Exploitation)

9.3.2.4 Search for Prey: (Exploration)

9.3.3 Algorithm and Flow Chart of GWO

9.4 Studies on GWO Variants

9.4.1 Hybridization

9.4.2 Extensions

9.4.3 Modification

9.5 Application of GWO in Medical Domain

9.6 Application of GWO in Microarray Data

9.7 Conclusion and Future Work

References

10. The Cluster Analysis and Feature Selection: Perspective of Machine Learning and Image Processing

10.1 Introduction

10.2 Various Image Segmentation Techniques

10.2.1 Clustering

10.2.2 Thresholding

10.2.3 Edge-Based Segmentation

10.2.4 Region-Based Image Segmentation

10.2.5 Watershed

10.3 How to Deal With Image Dataset. 10.3.1 Introduction

10.3.2 Image Acquisition

10.3.3 Image Pre-Processing

10.3.4 Image Enhancement

10.3.5 Image Segmentation

10.3.6 K-Mean Clustering

10.3.6.1 Euclidian Distance

10.3.6.2 Clustering

10.3.7 Density-Based Spatial Clustering of Application With Noise (DBSCAN)

10.3.8 SVM Classifier

10.4 Class Imbalance Problem

10.4.1 Resampling Approaches

10.5 Optimization of Hyperparameter

10.6 Case Study. 10.6.1 Pancreatic and Lung Tumor Prediction in the Machine Learning Era: Unique Supervised and Unsupervised Methodologies

10.6.2 Pancreatic Cysts (IPMN)

10.7 Using AI to Detect Coronavirus. 10.7.1 BlueDot AI Technology

10.8 Using Artificial Intelligence (AI), CT Scan and X-Ray

10.9 Conclusion

References

11. Artificial Intelligence and Machine Learning for Healthcare Solutions

11.1 Introduction

11.2 Using Machine Learning Approaches for Different Purposes

11.3 Various Resources of Medical Dataset for Research

11.4 Deep Learning in Healthcare

11.5 Various Projects in Medical Imaging and Diagnostics

11.6 Conclusion

12. Forecasting of Novel Corona Virus Disease (Covid-19) Using LSTM and XG Boosting Algorithms

12.1 Introduction

12.2 Machine Learning Algorithms for Forecasting

12.3 Proposed Method

12.3.1 LSTM (Longest Short-Term Memory)

12.3.2 XG Boost (eXtreme Gradient Boosting) Algorithm

12.3.3 Polynomial Regression

12.3.4 Performance Metrics

12.4 Implementation

12.4.1 The Main Python Code for LSTM

12.4.2 The Main Python Code for Polynomial Regression

12.4.3 The Main Python Code for XG Boosting Algorithm

12.4.4 Libraries or Methods Used in the Proposed Work

12.5 Results and Discussion

12.6 Conclusion and Future Work

References

13. An Innovative Machine Learning Approach to Diagnose Cancer at an Early Stage

13.1 Introduction

13.1.1 Multiscale Cancer Detection

13.2 Related Work

13.3 Materials and Methods

13.4 System Design

13.4.1 Artificial Neural Network

13.4.2 Back Propagation Network (BPN)

13.4.3 Support Vector Machine (SVM)

13.4.4 Pre-Processing

13.4.5 Feature Extraction

13.4.6 Database Updation

13.4.7 Classification

13.4.8 Clustering

13.4.9 Segmentation Using FCM Clustering

13.5 Results and Discussion

13.6 Conclusion

References

14. A Study of Human Sleep Staging Behavior Based on Polysomnography Using Machine Learning Techniques

14.1 Introduction

14.2 Polysomnography Signal Analysis

14.3 Case Study on Automated Sleep Stage Scoring. 14.3.1 Experimental Data

14.3.2 The Methodology

14.3.3 Experimental Results and Discussion

14.4 Summary and Conclusion

References

15. Detection of Schizophrenia Using EEG Signals

15.1 Introduction

15.1.1 The Human Brain

15.1.2 Schizophrenia

15.1.2.1 DSM-V Definition and Diagnosis Criteria of Schizophrenia

15.1.2.2 Types of Schizophrenia

15.1.2.3 Causes of Schizophrenia

15.1.2.4 Symptoms of Schizophrenia

15.1.3 Electroencephalograph (EEG)

15.1.3.1 Characterizations of EEG Signals

15.2 Methodology

15.2.1 EEG Signal Processing

15.2.2 Removing the Artifacts

15.2.3 Feature Extraction

15.2.4 Normalization

15.2.5 Feature Selection/Reduction

15.2.6 Feature Classification

15.3 Literature Review

15.4 Discussion

15.5 Conclusion

References

16. Performance Analysis of Signal Processing Techniques in Bioinformatics for Medical Applications Using Machine Learning Concepts

16.1 Introduction

16.1.1 Role of Machine Learning in Bioinformatics

16.1.2 Machine Learning Applications for Bioinformatics

16.1.3 Recent Trends in Bioinformatics

16.1.4 Data Analytics in Bioinformatics

16.1.5 Machine Learning Algorithms

16.2 Basic Definition of Anatomy and Cell at Micro Level

16.2.1 Biological Cells

16.2.2 DNA, RNA and Proteins

16.3 Signal Processing—Genome Signal Processing

16.3.1 Identification of Hotspots

16.3.2 Advantages of Computational Hotspot Identification Techniques Over Alanine-Scanning Mutagenesis

16.3.3 Overview of Protein Sequences

16.3.4 EIIP & CPNR-Based Mapping

16.3.5 Feature Extraction Technique

16.3.5.1 RRM—Resonant Recognition Model

16.3.5.2 Discrete Wavelet Transforms

16.4 Hotspots Identification Algorithm

16.5 Results—Experimental Investigations

16.6 Analysis Using Machine Learning Metrics. 16.6.1 Theoretical Details of Performance Metrics

16.6.2 Comparative Analysis of the Protein Sequence Representation

16.6.3 Visual Analysis

16.7 Conclusion

Appendix. A.1 Hotspot Identification Code

A.2 Performance Metrics Code

17. Survey of Various Statistical Numerical and Machine Learning Ontological Models on Infectious Disease Ontology

17.1 Introduction

17.2 Disease Ontology

17.3 Infectious Disease Ontology

17.4 Biomedical Ontologies on IDO

17.5 Various Methods on IDO

17.6 Machine Learning-Based Ontology for IDO

17.7 Recommendation or Suggestions for Future Study

17.8 Conclusions

References

18. An Efficient Model for Predicting Liver Disease Using Machine Learning

18.1 Introduction

18.2 Related Works

18.3 Proposed Model. 18.3.1 Elements of Experimental Methodology

18.3.1.1 Experimental Dataset

18.3.1.2 Overview of Data & Analysis

18.3.1.3 Data Preprocessing

18.3.1.4 Standardization

18.3.1.5 Label Encoding

18.3.2 Model Building

18.3.2.1 Support Vector Machines (SVM)

18.3.2.2 Logistic Regression

18.3.2.3 Naïve Bayes

18.3.2.4 Random Forests

18.3.2.5 Gradient Boosting

18.3.3 Performance Evaluation

18.3.4 Performance Optimization

18.3.4.1 N-Fold Cross Validation

18.4 Results and Analysis

18.5 Conclusion

References

19. A Novel Approach for Prediction of Stock Market Behavior Using Bioinformatics Techniques

19.1 Introduction

19.2 Literature Review

19.3 Proposed Work

19.3.1 Encoding of Stock Market Price Behavior to Binary String

19.3.2 Mapping Binary String Into DNA Sequence

19.3.3 Sequence Alignment Using BLAST

19.3.4 Prediction Method

19.3.5 Decoding Predicted DNA Sequence Into Binary String

19.3.6 Mismatching Analysis of Stock Market Behavior

19.4 Experimental Study

19.4.1 Data Analysis

19.4.2 Results

19.5 Conclusion and Future Work

20. Stock Market Price Behavior Prediction Using Markov Models: A Bioinformatics Approach

20.1 Introduction

20.2 Literature Survey

20.3 Proposed Work

20.3.1 Encoding of Stock Market Price Behavior to Binary Sequence

20.3.2 Conversion Between Binary Sequences to Nucleotide Sequence

20.3.3 Zero-Order Markov Model

20.3.4 First-Order Markov Model

20.3.5 Second-Order Markov Model

20.3.6 Hidden Markov Model

20.3.6.1 HMM for Stock Market Behavior Prediction

20.3.7 Decoding Predicted DNA Sequence into a Binary String

20.3.8 Mismatching Analysis of Stock Market Behavior

20.4 Experimental Work

20.4.1 Dataset Preparation

20.4.2 Results and Analysis

20.4.3 Performance Comparison Between Different Orders Markov Models

20.5 Conclusions and Future Work

References

Index

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Scrivener Publishing

100 Cummings Center, Suite 541J

.....

3. Géron, A., Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, United State of America, 2019.

4. Alshemali, B. and Kalita, J., Improving the reliability of deep neural networks in NLP: A review. Knowl.-Based Syst., 191, 105210, 2020.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Data Analytics in Bioinformatics
Подняться наверх