Biological Language Model

Biological Language Model
Автор книги: id книги: 1568648     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 8717 руб.     (85,73$) Читать книгу Купить и скачать книгу Купить бумажную книгу Электронная книга Жанр: Медицина Правообладатель и/или издательство: Ingram Дата добавления в каталог КнигаЛит: ISBN: 9789811212963 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Conceived as a cross between natural language processing methods and biological sequences in DNA, RNA and protein, biological language model is a new scientific research topic in bioinformatics that has been extensively studied by the authors. The basic theory and applications of this model are presented in this book to serve as an reference for graduate students and researchers.<b>Contents:</b> <ul><li>East China Normal University Scientific Reports</li><li>Preface</li><li>Acknowledgments</li><li>Introduction</li><li>Linguistic Feature Analysis of Protein Sequences</li><li>Amino Acid Encoding for Protein Sequence</li><li>Remote Homology Detection</li><li>Structure Prediction</li><li>Function Prediction</li><li>Summary and Future Perspectives</li><li>Index</li></ul> <br><b>Readership:</b> Graduate and research level students in the cross disciplines of bioinformatics/computational biology.Biological Language;Protein;Bioinformatics0<b>Key Features:</b><ul><li>To the best of our knowledge, this is the first book about biological language model</li></ul>

Оглавление

Qiwen Dong. Biological Language Model

Отрывок из книги

East China Normal University Scientific Reports

Subseries on Data Science and Engineering

.....

Amino acid encoding is the first step of protein structure and function prediction, and it is one of the foundations to achieve final success in those studies. In this chapter, we proposed the systematic classification of various amino acid encoding methods and reviewed the methods of each category. According to information sources and information extraction methodologies, these methods are grouped into five categories: binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding and machine-learning encoding. To benchmark and compare different amino acid encoding methods, we first selected 16 representative methods from those five categories. And then, based on the two representative protein-related studies, protein secondary structure prediction and protein fold recognition, we construct three machine learning models referring to the state-of-the-art studies. Finally, we encoded the protein sequence and implemented the same training and test phase on the benchmark datasets for each encoding method. The performance of each encoding method is regarded as the indicator of its potential in protein structure and function studies.

The assessment results show that the evolution-based position-dependent encoding method PSSM consistently achieves the best performance both on protein secondary structure prediction and protein fold recognition tasks, suggesting its important role in protein structure and function prediction. However, another evolution-based position-dependent encoding method — HMM — does not perform well, and the main reason for this could be that the remote homologous sequences only provide limited evaluation information for the target residue. For the one-hot encoding method, it is highly sparse and leads to complex machine learning models, while its two compressed representations, one-hot (6-bit) encoding and binary 5-bit encoding, lose more or less valuable information and cannot be widely used in related researches. More reasonable strategies to reduce the dimension of one-hot encoding need to be developed. For the physicochemical property encodings, the variety of properties and the extraction methodologies are two important factors needed to construct a valuable encoding. Structure-based encodings and machine-learning encodings achieve comparable or even better performances when compared with other widely used encodings, suggesting more attention needs to be paid to these two categories.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Biological Language Model
Подняться наверх