Читать книгу Bioinformatics - Группа авторов - Страница 23

Protein Sequence Databases

With the availability of myriad complete genome sequences from both prokaryotes and eukaryotes, significant effort is being dedicated to the identification and functional analysis of the proteins encoded by these genomes. The large-scale analysis of these proteins continues to generate huge amounts of data, including through the use of proteomic methods (Chapter 11) and through protein structure analysis (Chapter 12), to name a few. These and other methods make it possible to identify large numbers of proteins quickly, to map their interactions (Chapter 13), to determine their location within the cell, and to analyze their biological activities. This ever-increasing “information space” reinforces the central role that protein sequence databases play as a resource for storing data generated by these efforts, making them freely available to the life sciences community.

As most sequence data in protein databases are derived from the translation of nucleotide sequences, they can be, in large part, thought of as “secondary databases.” Universal protein sequence databases cover proteins from all species, whereas specialized protein sequence databases concentrate on particular protein families, groups of proteins, or those from a specific organism. Representative model organism databases include the Mouse Genome Database (MGD; Smith et al. 2018) and WormBase (Lee et al. 2018), among others (Baxevanis and Bateman 2015; Rigden and Fernández 2018). Organismal sequence databases are discussed in greater detail in Chapter 2.

Universal protein databases can be divided further into two broad categories: sequence repositories, where the data are stored with little or no manual intervention, and curated databases, in which experts enhance the original data through expert biocuration. The importance of ensuring interoperability, creating and implementing standards, and adopting best practices aimed at accurately representing the biological knowledge found within the sequence databases is absolutely paramount. Indeed, these curation goals are so important that there is an organization called the International Society for Biocuration, the primary mission of which is to advance these central tenets.

Подняться наверх