Читать книгу Bioinformatics - Группа авторов - Страница 34
Integrated Information Retrieval: The Entrez System
ОглавлениеOne of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. For example, a paper in PubMed may describe the sequencing of a gene whose sequence appears in GenBank. The nucleotide sequence, in turn, may code for a protein product whose sequence is stored in NCBI's Protein database. The three-dimensional structure of that protein may be known, and the coordinates for that structure may appear in NCBI's Structure database. Finally, there may be allelic or structural variants documented for the gene of interest, cataloged in databases such as the Single Nucleotide Polymorphism Database (called dbSNP) or the Database of Genomic Structural Variation (called dbVAR), respectively. The existence of such natural connections, all having a biological underpinning, motivated the development of a method through which all of the information about a particular biological entity could be found without having to sequentially visit and query individual databases, one by one.
Entrez, to be clear, is not a database itself. Rather, it is the interface through which its component databases can be accessed and traversed – an integrated information retrieval system. The Entrez information space includes PubMed records, nucleotide and protein sequence data, information on conserved protein domains, three-dimensional structure information, and genomic variation data with potential clinical relevance, a good number of which will be touched upon in this chapter. The strength of Entrez lies in the fact that all of this information, across a large number of component databases, can be accessed by issuing one – and only one – query. This very powerful, integrated approach is made possible through the use of two general types of connections between database entries: neighboring and hard links.