Practical Genomic Data Examples¶
This page collects a few non-trivial genomic examples that are useful as reference material.
Example specifications¶
The example specs below are self-contained and focus on a single track or layout.
Chromosome ideogram from cytobands¶
This example shows a chromosome ideogram built from UCSC cytoband data. The rectangles encode the band intervals and staining categories, the labels use ranged text to squeeze band names into the available space, and the dashed separators mark chromosome boundaries. The spec loads UCSC's gzipped cytoband file directly, and because the file has no header line, the TSV columns are declared explicitly.
{
"description": "An ideogram track example showing cytobands across the human genome, based on UCSC's hg38 cytoBand track.",
"assembly": "hg38",
"name": "ideogram-track",
"view": { "stroke": "black" },
"title": {
"text": "Chromosome Ideogram",
"style": "track-title"
},
"height": 24,
"data": {
"url": "https://data.genomespy.app/genomes/hg38/cytoBand.txt.gz",
"format": {
"type": "tsv",
"columns": ["chrom", "chromStart", "chromEnd", "name", "gieStain"]
}
},
"transform": [{ "type": "filter", "expr": "!test(/_/, datum.chrom)" }],
"encoding": {
"x": {
"chrom": "chrom",
"pos": "chromStart",
"type": "locus"
},
"x2": { "chrom": "chrom", "pos": "chromEnd" }
},
"resolve": {
"scale": {
"color": "independent"
}
},
"layer": [
{
"title": "Cytoband",
"mark": "rect",
"encoding": {
"color": {
"field": "gieStain",
"type": "nominal",
"scale": {
"domain": [
"gneg",
"gpos25",
"gpos50",
"gpos75",
"gpos100",
"acen",
"stalk",
"gvar"
],
"range": [
"#f0f0f0",
"#e0e0e0",
"#d0d0d0",
"#c0c0c0",
"#a0a0a0",
"#cc4444",
"#338833",
"#000000"
]
}
}
}
},
{
"mark": {
"type": "text",
"align": "center",
"baseline": "middle",
"paddingX": 4,
"tooltip": null
},
"encoding": {
"color": {
"field": "gieStain",
"type": "nominal",
"scale": {
"domain": [
"gneg",
"gpos25",
"gpos50",
"gpos75",
"gpos100",
"acen",
"stalk",
"gvar"
],
"range": [
"black",
"black",
"black",
"black",
"black",
"black",
"white",
"white"
]
}
},
"text": {
"field": "name",
"type": "nominal"
}
}
},
{
"transform": [
{
"type": "filter",
"expr": "datum.chromStart == 0 && datum.chrom != 'chr1'"
}
],
"encoding": {
"x2": null
},
"mark": {
"type": "rule",
"color": "#a0a0a0",
"strokeDash": [3, 3],
"strokeDashOffset": 2
}
}
]
}
The visualization uses a mirrored copy of UCSC's hg38 cytoband track, distributed as cytoBand.txt.gz. UCSC states that its downloadable data files and database tables are freely available for public and commercial use.
RefSeq gene annotations with scored labels¶
This example shows a RefSeq gene annotation track for hg38. Transcript bodies and exons are packed into lanes to reduce overlap, and the gene symbols use measureText and filterScoredLabels to keep the most useful gene names visible as the view changes. The prioritized gene symbols act as landmarks for navigating the genome, and the small arrows visible at high zoom levels indicate transcript direction. It uses a custom tooltip handler for RefSeq gene summaries. Of the roughly 30,000 gene symbols, only the highest-scoring ones in the visible genomic region are shown when there is room. The familiar gene symbols act as landmarks and help with navigation around the genome.
{
"description": "A track showing RefSeq gene annotations across the human genome, with labels scored and filtered to avoid overlap.",
"assembly": "hg38",
"name": "refseq-track",
"title": {
"text": "RefSeq Gene annotation",
"orient": "none"
},
"height": { "step": 23 },
"data": {
"url": "https://data.genomespy.app/genomes/hg38/refSeqGenes-hg38-release232.tsv.gz",
"format": {
"parse": {
"symbol": "string",
"chrom": "string",
"start": "integer",
"length": "integer",
"strand": "string",
"score": "integer",
"exons": "string"
}
}
},
"transform": [
{
"type": "linearizeGenomicCoordinate",
"chrom": "chrom",
"pos": "start",
"as": "_start"
},
{
"type": "formula",
"expr": "datum._start + datum.length",
"as": "_end"
},
{
"type": "formula",
"expr": "datum._start + datum.length / 2",
"as": "_centroid"
},
{
"type": "collect",
"sort": { "field": ["_start"] }
},
{
"type": "pileup",
"start": "_start",
"end": "_end",
"as": "_lane",
"preference": "strand",
"preferredOrder": ["-", "+"]
},
{
"type": "filter",
"expr": "datum._lane < 3"
}
],
"encoding": {
"y": {
"field": "_lane",
"type": "ordinal",
"scale": {
"type": "index",
"align": 0,
"paddingInner": 0.4,
"paddingOuter": 0.2,
"domain": [0, 3],
"reverse": true,
"zoom": false
},
"axis": null
}
},
"layer": [
{
"name": "transcripts",
"opacity": {
"unitsPerPixel": [100000, 40000],
"values": [0, 1]
},
"encoding": {
"color": { "value": "#909090" }
},
"layer": [
{
"name": "exons",
"transform": [
{ "type": "project", "fields": ["_lane", "_start", "exons"] },
{ "type": "flattenCompressedExons", "start": "_start" }
],
"mark": {
"type": "rect",
"minOpacity": 0.2,
"minWidth": 0.5,
"tooltip": null
},
"encoding": {
"x": { "field": "exonStart", "type": "locus" },
"x2": { "field": "exonEnd" }
}
},
{
"name": "bodies",
"title": "Gene annotations",
"mark": {
"type": "rule",
"minLength": 0.5,
"size": 1,
"tooltip": null
},
"encoding": {
"x": {
"field": "_start",
"type": "locus",
"axis": { "title": null }
},
"x2": { "field": "_end" },
"search": { "field": "symbol" }
}
}
]
},
{
"name": "symbols",
"transform": [
{
"type": "measureText",
"fontSize": 11,
"field": "symbol",
"as": "_textWidth"
},
{
"type": "filterScoredLabels",
"lane": "_lane",
"score": "score",
"width": "_textWidth",
"pos": "_centroid",
"padding": 5
}
],
"layer": [
{
"name": "labels",
"mark": {
"type": "text",
"size": 11,
"yOffset": 7,
"tooltip": {
"handler": "refseqgene"
}
},
"encoding": {
"x": {
"field": "_centroid",
"type": "locus"
},
"text": { "field": "symbol" }
}
},
{
"name": "arrows",
"opacity": {
"unitsPerPixel": [100000, 40000],
"values": [0, 1]
},
"mark": {
"type": "point",
"yOffset": 7,
"size": 50,
"tooltip": null
},
"encoding": {
"x": {
"field": "_centroid",
"type": "locus"
},
"dx": {
"expr": "(datum._textWidth / 2 + 5) * (datum.strand == '-' ? -1 : 1)",
"type": "quantitative",
"scale": null
},
"color": { "value": "black" },
"shape": {
"field": "strand",
"type": "nominal",
"scale": {
"domain": ["-", "+"],
"range": ["triangle-left", "triangle-right"]
}
}
}
}
]
}
]
}
The gene annotation track was inspired by HiGlass. The genes are scored by their citation counts, overlapping isoforms are merged into a single virtual isoform that includes all exons, and the annotations were preprocessed with compressGeneAnnotations.py.
ASCAT Copy-Number Segmentation¶
The ASCAT Copy-Number Segmentation page expands on a more complex GenomeSpy visualization built from ASCAT's simulated example data. It shows vertically concatenated views for allele-specific copy numbers, LogR, and B-allele frequency, and adds ideogram and RefSeq gene annotations.
ASCAT Algorithm in GenomeSpy¶
The ASCAT Algorithm in GenomeSpy page visualizes the
core ASCAT fit. It starts from the segmented logRMean and bafMean values,
estimates raw major/minor copy numbers from the current rho and psi
values, rounds them to integers, and shows how the fit changes when you adjust
the parameters.
Sashimi plot from splice junctions¶
The Sashimi plot from splice junctions page shows IGV's splice-junction demo data as a lazy BigWig coverage track plus dome-shaped splice arcs from a BED file. The slider filters the arcs by uniquely mapped read count, and the arc labels show the junction scores.
More examples¶
For many more examples of visualizing genomic data, see Lazy data sources.
Observable notebooks¶
The Annotation Tracks notebook explains how to implement a chromosome ideogram and a gene annotation track.
Website examples¶
The genomespy.app main page showcases several examples, some of which focus on genomic data.