Читать книгу Bioinformatics - Группа авторов - Страница 39
Organismal Sequence Databases Beyond NCBI
ОглавлениеAlthough it may appear from this discussion that NCBI is the center of the sequence universe, many specialized genomic databases throughout the world serve specific groups in the scientific community. Often, these databases provide additional information not available elsewhere, such as phenotypes, experimental conditions, strain crosses, and map features. These data are of great importance to these communities, not only because they influence experimental design and interpretation of experimental results but also because the kinds of data they contain do not always fit neatly within the confines of the NCBI data model. Development of specialized databases necessarily ensued (and continues), and these databases are intended to be used as an important adjunct to GenBank and similar global databases. It is impossible to discuss the wide variety of such value-added databases here but, to emphasize the sheer number of such databases that exist, the journal Nucleic Acids Research devotes its first issue every year to papers describing these databases (Galperin et al. 2017).
Figure 2.17 A clickable map showing where actively recruiting clinical trials relating to colorectal neoplasms are being conducted. This map-based view of the information presented in Figure 2.15 is useful in identifying trials that are within a reasonable distance of a potential study participant's home.
An excellent representative example of a specialized organismal database is the Mouse Genome Database (MGD; Bult et al. 2016). Housed at the Jackson Laboratory in Bar Harbor, ME, the MGD provides a curated, comprehensive knowledgebase on the laboratory mouse and is an integral part of its overall Mouse Genome Informatics (MGI) resource. The MGD provides information on genes, genetic markers, mutant alleles and phenotypes, and homologies to other organisms, as well as extensive linkage, cytogenetic, genetic, and physical mapping data. A cross-section of these data is shown in Figure 2.18, providing information on the Dcc gene in mouse, the ortholog to the human DCC gene from the examples discussed earlier in this chapter. This page can be accessed either by directly searching for the gene name or, in this case, by following links found within the Animal Model section of the OMIM entry for DCC discussing seminal discoveries made using mouse models that, in turn, informed our understanding of the effect of DCC mutations in humans.
Figure 2.18 The Mouse Genome Informatics (MGI) entry for the Dcc gene in mouse. The entry provides information on the ortholog to the human DCC gene, including data on mutant alleles and phenotypes, mapping data, single-nucleotide polymorphisms, and expression data. In the Mutations, Alleles, and Phenotypes section, the phenotype overview uses blue squares to indicate which phenotypes are due to mutations in the Dcc gene. In the Expression section, blue squares indicate expression in wild-type mice in the designated tissues, organs, or systems.
Another long-standing resource devoted to a specific organism is the Zebrafish Model Organism Database, or Zebrafish Information Network (ZFIN) (Howe et al. 2012) – a particularly attractive animal model given the experimental tractability of zebrafish in studying a wide variety of questions focused on vertebrate development, regeneration, inflammation, infectious disease, and drug discovery, to name a few. ZFIN provides a very simple search interface that allows free-text searches using any term. Using DCC once again as our search term brings the user to the summary page for the zebrafish dcc gene (Figure 2.19), providing information on zebrafish mutants, sequence targeting reagents, transgenic constructs, orthology to other organisms, data on protein domains found within the Dcc protein product, and annotated gene expression and phenotype data derived from the literature or from direct submissions by members of the zebrafish community. Here, by following the link to the 19 figures in the Gene Expression section, one can examine full-size images illustrating expression patterns for dcc under various experimental conditions (Figure 2.20).
Figure 2.19 The Zebrafish Information Network (ZFIN) gene page for the dcc gene in zebrafish. This entry provides information on the ortholog to the human DCC gene. See text for details.
While MGD and ZFIN are excellent examples of model organism databases, every major model organism community maintains such a resource. These groups also collaborate to develop central portals to ease information retrieval across many of these resources through the Alliance of Genome Resources.
Figure 2.20 An example of gene expression data available through the Zebrafish Information Network (ZFIN), here showing the expression patterns for the zebrafish dcc gene under various experimental conditions. The inset displays a full-size image of data from Gao et al. (2012), showing the expression pattern for dcc in (panel A) and the co-expression of dcc and the Lim homeobox 5 gene (lhx5; panel B).