Читать книгу Bioinformatics - Группа авторов - Страница 81

UCSC Table Browser

Оглавление

The Table Browser tool provides users a text-based interface with which to query, intersect, filter, and download the data that are displayed graphically in the Genome Browser. These data can then be saved in a spreadsheet for further analysis, or used as input into a different program. Using a web-based interface, users select a genome assembly, track, and position, then choose how to manipulate that track data and what fields to return. This example will demonstrate how to retrieve a list of all NCBI mRNA reference sequences that overlap with an SNP from the Genome-Wide Association Study (GWAS) Catalog track, which identifies genetic loci associated with common diseases or traits. The GWAS Catalog is a manually curated collection of published genome-wide association studies that assayed at least 100 000 SNPs, in which all SNP-trait associations have p values of <1 × 10−5 (Buniello et al. 2019).

The Table Browser landing page is accessible from either the UCSC Genome Browser home page or the Tools pull-down menu. First, reset all user cart settings by clicking on the click here link at the bottom of the Table Browser settings section.

Then, select the NCBI RefSeq track on the GRCh38 genome assembly (Figure 4.12a). Create a filter to limit the search to curated mRNA reference sequences in the NM_ accession series (Box 1.2; Figure 4.12b). Next, intersect the RefSeq track with variants from the GWAS Catalog (Figure 4.12c). Finally, on the Table Browser form, change the output format to hyperlinks to Genome Browser, then click get output. The output is a list of 3000+ RefSeq mRNAs that overlap with a variant from the GWAS Catalog (Figure 4.12d). The Genome Browser view of one of the transcripts, from the gene arginine–glutamic acid dipeptide (RE) repeats (RERE), and the six SNPs from the GWAS Catalog that it overlaps, can be found by clicking on the first link in the results list and is shown in Figure 4.12e.

Figure 4.12 Configuring the UCSC Table Browser. The link to the Table Browser is in the Tools menu at the top of each page. (a) On the Table Browser home page, first reset all previous selections by clicking on the reset button at the bottom of the window. Next, select the track called NCBI RefSeq in the group Genes and Gene Predictions on the human GRCh38 genome assembly. The region should be set to genome and the output format to hyperlinks to Genome Browser. (b) Create a filter to limit the search to curated mRNA reference sequences in the NM_ accession series (see Box 1.2). Click on the filter button shown in Figure 4.12a and enter the term NM_* in the name field. The asterisk is a wildcard character that matches any text. Thus, this setting will limit the results to those curated RefSeqs whose name contains the term NM_. (c) Create an intersection between the RefSeq track and the variants from the GWAS Catalog. Click on the intersection button shown in Figure 4.12a and select the appropriate track. The group is Phenotype and Literature and the track is called GWAS Catalog. Leave other selections set to the default. (d) Click on the get output button shown in Figure 4.12a. The output is a list of more than 3000 RefSeq mRNAs that overlap with a variant from the GWAS Catalog. Each RefSeq is hyperlinked to the Genome Browser. (e) The first link is to NM_001042682.1, a transcript of the gene arginine–glutamic acid dipeptide (RE) repeats (RERE). The genomic context of RERE shows the eight SNPs from the GWAS Catalog that it overlaps.

UCSC also provides a related tool called the Data Integrator. The Data Integrator has a more sophisticated intersection function than does the Table Browser, as it can intersect data from up to five separate tracks, and output fields from both the selected tracks and related tables. Thus, for example, output from the Data Integrator could include the gene symbol in addition to the accession number for each transcript on the RefSeq track, along with the dbSNP identifier for the variants in the GWAS Catalog. However, the Data Integrator does not allow for filtering, so it is not possible to restrict the output to only RefSeq mRNA genes.

Bioinformatics

Подняться наверх