Читать книгу Bioinformatics - Группа авторов - Страница 77
The UCSC Genome Browser
ОглавлениеAfter starting in 2000 with just a display of an early draft of the human genome assembly, the UCSC Genome Browser now provides access to assemblies and annotations from over 100 organisms (Haeussler et al. 2019). The majority of assemblies are of mammalian genomes, but other vertebrates, insects, nematodes, deuterostomes, and the Ebola virus are also included. The assemblies from some organisms, including human and mouse, are available in multiple versions. New organisms and assembly versions are added regularly.
The UCSC Browser presents genomic annotation in the form of tracks. Each track provides a different type of feature, from genes to SNPs to predicted gene regulatory regions to expression data. Each organism has its own set of tracks, some created by the UCSC Genome Bioinformatics team and others provided by members of the bioinformatics community. Over 200 tracks are available for the GRCh37 version of the human genome assembly. The newer human genome assembly, GRCh38, has fewer tracks, as not all the data have been remapped from the older assembly. Other genomes are not as well annotated as human; for example, fewer than 20 tracks are available for the sea hare. Some tracks, such as those created from NCBI transcript data, are updated weekly, while others, such as the SNP tracks created from NCBI variant data (Sayers et al. 2019), are updated less frequently, depending on the release schedule of the underlying data. For ease of use, tracks are organized into subsections. For example, depending on the organism, the Genes and Gene Predictions section may include evidence-based gene predictions, ab initio gene predictions, and/or alignment of protein sequences from other species.
The home page of the UCSC Genome Browser provides a stepping-off point for many of the resources developed by the Genome Bioinformatics group at UCSC, including the Genome Browser, BLAT, and the Table Browser, which will be described in detail later in this chapter. The Tools menu provides a link to liftOver, a widely used tool that converts genomic coordinates from one assembly to another. Using this tool, it is possible to update annotation files so that old data can be integrated into a new genome assembly. The Download menu provides an option to download all the sequence and annotation data for each genome assembly hosted by UCSC, as well as some of the source code. The What's New section provides updates on new genome assemblies, as well as new tools and features. Finally, there is an extensive Help menu, with detailed documentation as well as videos. Users may also submit questions to a mailing list, and most queries are answered within a day.
The UCSC Genome Browser provides multiple ways for both individual users and larger genome centers to share data with collaborators or even the entire bioinformatics community. These sharing options are available on the My Data link on the home page. Custom Tracks allow users to display their own data as a separate annotation track in the browser. User data must be formatted in a standard data structure in order to be interpreted correctly by the browser. Many commonly used file formats are supported, including Browser Extensible Data (BED), Binary Alignment/Map (BAM), and Variant Call Format (VCF; Box 4.1). Small data files can be uploaded or pasted into the Genome Browser for personal use. Larger files must be saved on the user's web server and accessed by URL through the Genome Browser. As anyone with the URL can access the data, this method can be used to share data with collaborators. Alternatively, Custom Tracks, along with track configurations and settings, can be shared with selected collaborators using a named Session. Some groups choose to make their Sessions available to the world at large in My Data → Public Sessions. Finally, groups with very large datasets can host their data in the form of a Track Hub so that it can be viewed on the UCSC Genome Browser. When a Track Hub is paired with an Assembly Hub, it can be used to create a browser for a genome assembly not already hosted by UCSC.