Data Analytics in Bioinformatics

Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Группа авторов. Data Analytics in Bioinformatics
Table of Contents
List of Illustrations
List of Tables
Guide
Pages
Data Analytics in Bioinformatics. A Machine Learning Perspective
Preface
Acknowledgement
1. Introduction to Supervised Learning
1.1 Introduction
1.2 Learning Process & its Methodologies
1.2.1 Supervised Learning
1.2.2 Unsupervised Learning
1.2.3 Reinforcement Learning
1.3 Classification and its Types
1.4 Regression
1.4.1 Logistic Regression
1.4.2 Difference between Linear & Logistic Regression
1.5 Random Forest
1.6 K-Nearest Neighbor
1.7 Decision Trees
1.8 Support Vector Machines
1.9 Neural Networks
1.10 Comparison of Numerical Interpretation
1.11 Conclusion & Future Scope
References
2. Introduction to Unsupervised Learning in Bioinformatics
2.1 Introduction
2.2 Clustering in Unsupervised Learning
2.3 Clustering in Bioinformatics—Genetic Data
2.3.1 Microarray Analysis
2.3.2 Clustering Algorithms
2.3.3 Partition Algorithms. 2.3.3.1 k-Means Clustering
2.3.3.2 Cluster Center Initialization Algorithm (CCIA)
2.3.3.3 Intelligent Kernel k-Mean (IKKM)
2.3.3.4 Clustering Large Applications (CLARA)
2.3.4 Hierarchical Clustering Algorithms
2.3.4.1 AGNES (Agglomerative Nesting)
2.3.4.2 DIANA (Divisive Analysis)
2.3.4.3 CURE (Clustering Using Representatives)
2.3.4.4 CHAMELEON
2.3.4.5 BRICH (Balanced Iterative Reducing and Clustering Using Hierarchies)
2.3.5 Density-Based Approach
2.3.5.1 DBSCAN
2.3.6 Model-Based Approach
2.3.6.1 SOM (Self-Organizing Maps)
2.3.7 Grid-Based Clustering
2.3.7.1 STING (Statistical Information Grid-Based Algorithm)
2.3.8 Soft Clustering
2.3.8.1 FCM (Fuzzy Class Membership)
2.4 Conclusion
References
3. A Critical Review on the Application of Artificial Neural Network in Bioinformatics
3.1 Introduction
3.1.1 Different Areas of Application of Bioinformatics
3.1.2 Bioinformatics in Real World
3.1.3 Issues with Bioinformatics
3.1.3.1 Issues Related to Structure
3.1.3.2 Sequence Analysis
3.2 Biological Datasets
3.3 Building Computational Model
3.3.1 Data Pre-Processing and its Necessity
3.3.2 Biological Data Classification
3.3.3 ML in Bioinformatics
3.3.4 Introduction to ANN
3.3.5 Application of ANN in Bioinformatics
3.3.6 Broadly Used Supervised Machine Learning Techniques
3.4 Literature Review
3.4.1 Comparative Analysis of ANN With Broadly Used Traditional ML Algorithms
3.5 Critical Analysis
3.6 Conclusion
References
4. Dimensionality Reduction Techniques: Principles, Benefits, and Limitations
4.1 Introduction
4.2 The Benefits and Limitations of Dimension Reduction Methods
4.3 Components of Dimension Reduction
4.3.1 Feature Selection
4.3.2 Feature Reduction
4.4 Methods of Dimensionality Reduction
4.4.1 Principal Component Analysis (PCA)
4.4.2 Missing Values Ratio (MVR)
4.4.3 Linear Discriminant Analysis (LDA)
4.4.4 Backward Feature Elimination (BFE)
4.4.5 Forward Feature Construction (FFC)
4.4.6 Independent Component Analysis (ICA)
4.4.7 Low Variance Filter (LVF)
4.4.8 High Correlation Filter
4.4.9 Random Forests (RF)/Ensemble Trees
4.4.10 t-Distributed Stochastic Neighbor Embedding (t-SNE)
4.4.11 Autoencoder
4.4.12 Factor Analysis (FA)
4.4.13 Uniform Manifold Approximation and Projection (UMAP)
4.4.14 Information Gain (IG)
4.4.15 Vector Quantization (VQ)
4.5 Conclusion
References
5. Plant Disease Detection Using Machine Learning Tools With an Overview on Dimensionality Reduction
5.1 Introduction
5.2 Flowchart
5.3 Machine Learning (ML) in Rapid Stress Phenotyping
5.4 Dimensionality Reduction
5.4.1 Feature Extraction
5.4.1.1 PCA (Principal Component Analysis)
5.4.1.2 LDA (Linear Discriminant Analysis)
5.4.1.3 SIFT (Scale Invariant Feature Transform)
5.4.1.4 SURF (Speeded Up Robust Features)
5.4.1.5 ORB (Oriented FAST and Rotated BRIEF)
5.5 Literature Survey
5.6 Types of Plant Stress
5.6.1 Biotic Stress
5.6.1.1 Fungal Pathogen
5.6.1.2 Bacterial Pathogen
5.7 Implementation I: Numerical Dataset. 5.7.1 Dataset Description
5.7.2 Results
5.7.3 Discussion
5.8 Implementation II: Image Dataset. 5.8.1 Dataset Description
5.8.2 Method Used
5.8.3 Results. 5.8.3.1 Results of ORB Feature Extraction and Brute Force Matching
5.8.3.1.3 Bacterial Leaf Blight on Rice Leaves
5.8.3.2 Color Histogram Comparison: Using Correlation Method
5.8.4 Discussions
5.9 Conclusion
References
6. Gene Selection Using Integrative Analysis of Multi-Level Omics Data: A Systematic Review
6.1 Introduction
6.2 Approaches for Gene Selection
6.3 Multi-Level Omics Data Integration
6.3.1 Horizontal Integration
6.3.2 Vertical Integration
6.4 Machine Learning Approaches for Multi-Level Data Integration
6.4.1 Unsupervised Integration of Omics Data
6.4.2 Supervised Integration of Omics Data
6.5 Critical Observation
6.6 Conclusion
References
7. Random Forest Algorithm in Imbalance Genomics Classification
7.1 Introduction
7.2 Methodological Issues. 7.2.1 Decision Tree (DT) Classifier
7.2.2 Ensemble Techniques
7.2.3 Mathematical Formulation of Ensemble Technique
7.2.4 Bagging
7.2.5 Bagging Pseudocode
7.2.6 Random Forest
7.3 Biological Terminologies. 7.3.1 DNA
7.3.2 Genomics
7.3.3 Proteins
7.4 Proposed Model
7.4.1 Balancing the Data
Algorithm 1: Clustering based on similarity measure
7.4.2 Ensembling of Trees
Algorithm 2: Ensembling of DTs
7.5 Experimental Analysis
7.6 Current and Future Scope of ML in Genomics
7.6.1 Gene Sequencing
7.6.2 Services to Consumer
7.6.3 Gene Editing
7.6.4 Pharmacy Genomics
7.6.5 Newborn Genetic Screening
7.7 Conclusion
References
8. Feature Selection and Random Forest Classification for Breast Cancer Disease
8.1 Introduction
8.2 Literature Survey
8.3 Machine Learning
8.4 Feature Engineering
8.5 Methodology. 8.5.1 Dataset Collection
8.5.2 Proposed Work
8.5.2.1 Selection of Feature by Means of Correlation and Accuracy Calculation Using Random Forest Classification
8.5.2.2 Feature Selection Using one Variety and Accuracy Calculation Using Random Forest Classification
8.5.2.3 Feature Elimination Using RFE and Classification Using Random Forest
8.6 Result Analysis
8.7 Conclusion
References
9. A Comprehensive Study on the Application of Grey Wolf Optimization for Microarray Data
9.1 Introduction
9.2 Microarray Data
9.3 Grey Wolf Optimization (GWO) Algorithm
9.3.1 Principle of GWO
9.3.2 Mathematical Model of GWO
9.3.2.1 The Encircling
9.3.2.2 Hunting
9.3.2.3 Attacking Prey: (Exploitation)
9.3.2.4 Search for Prey: (Exploration)
9.3.3 Algorithm and Flow Chart of GWO
9.4 Studies on GWO Variants
9.4.1 Hybridization
9.4.2 Extensions
9.4.3 Modification
9.5 Application of GWO in Medical Domain
9.6 Application of GWO in Microarray Data
9.7 Conclusion and Future Work
References
10. The Cluster Analysis and Feature Selection: Perspective of Machine Learning and Image Processing
10.1 Introduction
10.2 Various Image Segmentation Techniques
10.2.1 Clustering
10.2.2 Thresholding
10.2.3 Edge-Based Segmentation
10.2.4 Region-Based Image Segmentation
10.2.5 Watershed
10.3 How to Deal With Image Dataset. 10.3.1 Introduction
10.3.2 Image Acquisition
10.3.3 Image Pre-Processing
10.3.4 Image Enhancement
10.3.5 Image Segmentation
10.3.6 K-Mean Clustering
10.3.6.1 Euclidian Distance
10.3.6.2 Clustering
10.3.7 Density-Based Spatial Clustering of Application With Noise (DBSCAN)
10.3.8 SVM Classifier
10.4 Class Imbalance Problem
10.4.1 Resampling Approaches
10.5 Optimization of Hyperparameter
10.6 Case Study. 10.6.1 Pancreatic and Lung Tumor Prediction in the Machine Learning Era: Unique Supervised and Unsupervised Methodologies
10.6.2 Pancreatic Cysts (IPMN)
10.7 Using AI to Detect Coronavirus. 10.7.1 BlueDot AI Technology
10.8 Using Artificial Intelligence (AI), CT Scan and X-Ray
10.9 Conclusion
References
11. Artificial Intelligence and Machine Learning for Healthcare Solutions
11.1 Introduction
11.2 Using Machine Learning Approaches for Different Purposes
11.3 Various Resources of Medical Dataset for Research
11.4 Deep Learning in Healthcare
11.5 Various Projects in Medical Imaging and Diagnostics
11.6 Conclusion
12. Forecasting of Novel Corona Virus Disease (Covid-19) Using LSTM and XG Boosting Algorithms
12.1 Introduction
12.2 Machine Learning Algorithms for Forecasting
12.3 Proposed Method
12.3.1 LSTM (Longest Short-Term Memory)
12.3.2 XG Boost (eXtreme Gradient Boosting) Algorithm
12.3.3 Polynomial Regression
12.3.4 Performance Metrics
12.4 Implementation
12.4.1 The Main Python Code for LSTM
12.4.2 The Main Python Code for Polynomial Regression
12.4.3 The Main Python Code for XG Boosting Algorithm
12.4.4 Libraries or Methods Used in the Proposed Work
12.5 Results and Discussion
12.6 Conclusion and Future Work
References
13. An Innovative Machine Learning Approach to Diagnose Cancer at an Early Stage
13.1 Introduction
13.1.1 Multiscale Cancer Detection
13.2 Related Work
13.3 Materials and Methods
13.4 System Design
13.4.1 Artificial Neural Network
13.4.2 Back Propagation Network (BPN)
13.4.3 Support Vector Machine (SVM)
13.4.4 Pre-Processing
13.4.5 Feature Extraction
13.4.6 Database Updation
13.4.7 Classification
13.4.8 Clustering
13.4.9 Segmentation Using FCM Clustering
13.5 Results and Discussion
13.6 Conclusion
References
14. A Study of Human Sleep Staging Behavior Based on Polysomnography Using Machine Learning Techniques
14.1 Introduction
14.2 Polysomnography Signal Analysis
14.3 Case Study on Automated Sleep Stage Scoring. 14.3.1 Experimental Data
14.3.2 The Methodology
14.3.3 Experimental Results and Discussion
14.4 Summary and Conclusion
References
15. Detection of Schizophrenia Using EEG Signals
15.1 Introduction
15.1.1 The Human Brain
15.1.2 Schizophrenia
15.1.2.1 DSM-V Definition and Diagnosis Criteria of Schizophrenia
15.1.2.2 Types of Schizophrenia
15.1.2.3 Causes of Schizophrenia
15.1.2.4 Symptoms of Schizophrenia
15.1.3 Electroencephalograph (EEG)
15.1.3.1 Characterizations of EEG Signals
15.2 Methodology
15.2.1 EEG Signal Processing
15.2.2 Removing the Artifacts
15.2.3 Feature Extraction
15.2.4 Normalization
15.2.5 Feature Selection/Reduction
15.2.6 Feature Classification
15.3 Literature Review
15.4 Discussion
15.5 Conclusion
References
16. Performance Analysis of Signal Processing Techniques in Bioinformatics for Medical Applications Using Machine Learning Concepts
16.1 Introduction
16.1.1 Role of Machine Learning in Bioinformatics
16.1.2 Machine Learning Applications for Bioinformatics
16.1.3 Recent Trends in Bioinformatics
16.1.4 Data Analytics in Bioinformatics
16.1.5 Machine Learning Algorithms
16.2 Basic Definition of Anatomy and Cell at Micro Level
16.2.1 Biological Cells
16.2.2 DNA, RNA and Proteins
16.3 Signal Processing—Genome Signal Processing
16.3.1 Identification of Hotspots
16.3.2 Advantages of Computational Hotspot Identification Techniques Over Alanine-Scanning Mutagenesis
16.3.3 Overview of Protein Sequences
16.3.4 EIIP & CPNR-Based Mapping
16.3.5 Feature Extraction Technique
16.3.5.1 RRM—Resonant Recognition Model
16.3.5.2 Discrete Wavelet Transforms
16.4 Hotspots Identification Algorithm
16.5 Results—Experimental Investigations
16.6 Analysis Using Machine Learning Metrics. 16.6.1 Theoretical Details of Performance Metrics
16.6.2 Comparative Analysis of the Protein Sequence Representation
16.6.3 Visual Analysis
16.7 Conclusion
Appendix. A.1 Hotspot Identification Code
A.2 Performance Metrics Code
17. Survey of Various Statistical Numerical and Machine Learning Ontological Models on Infectious Disease Ontology
17.1 Introduction
17.2 Disease Ontology
17.3 Infectious Disease Ontology
17.4 Biomedical Ontologies on IDO
17.5 Various Methods on IDO
17.6 Machine Learning-Based Ontology for IDO
17.7 Recommendation or Suggestions for Future Study
17.8 Conclusions
References
18. An Efficient Model for Predicting Liver Disease Using Machine Learning
18.1 Introduction
18.2 Related Works
18.3 Proposed Model. 18.3.1 Elements of Experimental Methodology
18.3.1.1 Experimental Dataset
18.3.1.2 Overview of Data & Analysis
18.3.1.3 Data Preprocessing
18.3.1.4 Standardization
18.3.1.5 Label Encoding
18.3.2 Model Building
18.3.2.1 Support Vector Machines (SVM)
18.3.2.2 Logistic Regression
18.3.2.3 Naïve Bayes
18.3.2.4 Random Forests
18.3.2.5 Gradient Boosting
18.3.3 Performance Evaluation
18.3.4 Performance Optimization
18.3.4.1 N-Fold Cross Validation
18.4 Results and Analysis
18.5 Conclusion
References
19. A Novel Approach for Prediction of Stock Market Behavior Using Bioinformatics Techniques
19.1 Introduction
19.2 Literature Review
19.3 Proposed Work
19.3.1 Encoding of Stock Market Price Behavior to Binary String
19.3.2 Mapping Binary String Into DNA Sequence
19.3.3 Sequence Alignment Using BLAST
19.3.4 Prediction Method
19.3.5 Decoding Predicted DNA Sequence Into Binary String
19.3.6 Mismatching Analysis of Stock Market Behavior
19.4 Experimental Study
19.4.1 Data Analysis
19.4.2 Results
19.5 Conclusion and Future Work
20. Stock Market Price Behavior Prediction Using Markov Models: A Bioinformatics Approach
20.1 Introduction
20.2 Literature Survey
20.3 Proposed Work
20.3.1 Encoding of Stock Market Price Behavior to Binary Sequence
20.3.2 Conversion Between Binary Sequences to Nucleotide Sequence
20.3.3 Zero-Order Markov Model
20.3.4 First-Order Markov Model
20.3.5 Second-Order Markov Model
20.3.6 Hidden Markov Model
20.3.6.1 HMM for Stock Market Behavior Prediction
20.3.7 Decoding Predicted DNA Sequence into a Binary String
20.3.8 Mismatching Analysis of Stock Market Behavior
20.4 Experimental Work
20.4.1 Dataset Preparation
20.4.2 Results and Analysis
20.4.3 Performance Comparison Between Different Orders Markov Models
20.5 Conclusions and Future Work
References
Index
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Scrivener Publishing
100 Cummings Center, Suite 541J
.....
3. Géron, A., Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, United State of America, 2019.
4. Alshemali, B. and Kalita, J., Improving the reliability of deep neural networks in NLP: A review. Knowl.-Based Syst., 191, 105210, 2020.
.....