Читать книгу Bioinformatics - Группа авторов - Страница 85

JBrowse

Оглавление

While the UCSC and Ensembl Genome Browsers provide user-friendly interfaces for viewing genomic data from well-characterized organisms, there are fewer applications for displaying genome assemblies and annotations for newly sequenced organisms or non-standard assemblies. The source code and executables for the UCSC Genome Browser are freely available for academic, non-profit, and personal use, and can be set up to display custom data, not just those provided by UCSC. Thus, one option is for researchers to host their own UCSC Genome Browser and use it to share custom genomes with the bioinformatics community. An alternate method for sharing novel genome assemblies is to set up an Assembly Hub. Researchers host the specially formatted genomic sequence and data tracks on their own web site, and anyone with the URL can view the assembly though the UCSC Genome Browser.

Another way to share novel genome assemblies is to use JBrowse (Buels et al. 2016), a web-based genome browser that is part of the Generic Model Organism Database (GMOD) project, a suite of tools for generating genomic databases. JBrowse can handle data in a variety of formats, and is relatively easy to install on a Linux- or Mac OS X-based web server (Skinner and Holmes 2010). JBrowse browsers support plant genomes (e.g. Phytozome), animal genomes (e.g. the Rat Genome Database), and disease-related databases of human data (e.g. the COSMIC Genome Browser).

An example of using JBrowse to view a customized genome assembly and associated annotations is at the Mnemiopsis Genome Project (MGP) Portal at the National Human Genome Research Institute (NHGRI) of the US National Institutes of Health (NIH). Mnemiopsis leidyi is a type of ctenophore, or comb jelly, a phylum of gelatinous zooplankton found in all the world's seas. The members of this phylum are called comb jellies because of their highly ciliated comb rows, providing their primary means of locomotion, and these early branching metazoans have proven to be an important model organism for understanding the diversity and complexity seen in the early evolution of animals. The Mnemiopsis data featured in this portal are the first set of whole genome sequencing data on any ctenophore species to be published and made available to the scientific community (Moreland et al. 2014). The portal provides not only genomic and protein model sequence data, but also a BLAST search interface, pathway and protein domain analysis, and a customized genome browser, implemented in JBrowse, to display the annotation data.

The Mnemiopsis genome was assembled into 5100 scaffolds using next generation sequence data from the Roche 454 and Illumina GA-II methods of sequencing (Ryan et al. 2013). The Mnemiopsis protein-coding gene models were predicted by integrating the results of ab initio gene prediction programs with RNA-seq transcript data and sequence similarity to other protein datasets. A view of one of those scaffolds is shown in Figure 4.23. As with the UCSC and Ensembl Genome Browsers, data are organized in horizontal tracks, and exons are shown as colored boxes. The first track, SCF, is the scaffold. The gene model track, labeled 2.2, displays the exons of the predicted gene models. The next track, called PFAM2.2, highlights Pfam domains found in the gene model. The Mnemiopsis RNA-seq reads were assembled into transcripts using the Cufflinks program (Trapnell et al. 2010), and the CL2 track shows the alignment of those transcripts to the genomic scaffold. The MASK track highlights repetitive regions. The EST and GBNT tracks show, respectively, the alignment of publicly available Mnemiopsis EST and other RNA sequences from GenBank. These two tracks are empty in this region, so the gene in the gene model track is a novel gene prediction. The overlap between the exons on the Pfam and gene model tracks shows that the predicted gene contains known protein domains. The CL2 track lends further support to the gene prediction, as the exons of the experimentally derived Mnemiopsis transcripts overlap the exons on the gene model track.

Navigation in JBrowse is fairly straightforward, especially for those already accustomed to using the UCSC or Ensembl Genome Browsers. Tracks can be added or removed from display by using the checkboxes on the left side of the window. On the display window, click on a track name and drag it to move the track up or down. To shift the focus of the display window upstream or downstream, click on the display and drag it to the left or right. The left and right arrows at the top of the page also move the display window. JBrowse provides multiple ways to zoom in and out. One option is to use the plus and minus magnifying glasses at the top of the page. Alternatively, place the mouse in the sequence coordinates above the top track and click and drag to highlight a region and zoom in on it. Double clicking on a region also zooms in. Clicking on a track feature opens a window with additional information about that feature. For example, on the MGP Portal, clicking on a gene model in the 2.2 track opens the Gene Wiki for that model, a detailed page that includes nucleotide and protein sequences, pre-computed BLAST searches, and annotated Pfam domains. Note that although the general look and feel of JBrowse will remain similar across different genomes, individual JBrowse developers will create tracks and customizations that are specific to their genome project.


Figure 4.23 JBrowse display of a predicted Mnemiopsis gene (ML05372a) from the Mnemiopsis Genome Project Portal at the National Human Genome Research Institute. Seven tracks are shown on this display: SCF, assembled genomic regions are solid black and intermittent gaps are shaded bright pink; 2.2, consensus Mnemiopsis gene models; PFAM2.2, non-redundant Mnemiopsis protein domains derived from Pfam; CL2, RNA-seq reads derived from Mnemiopsis embryos, assembled into transcripts using Cufflinks (Trapnell et al. 2010); MASK, genomic regions that have been repeat-masked using VMatch are shaded in light blue; EST, Mnemiopsis expressed sequence tags (ESTs) from GenBank; GBNT, Mnemiopsis mRNAs and other non-EST RNAs from GenBank.

Bioinformatics

Подняться наверх