Lazy Data Sources¶
Lazy data sources load data on-demand in response to user interactions. Unlike eager sources, most lazy data sources support indexing, which offers the capability to retrieve and load data partially and incrementally, as users navigate the genome. This is especially useful for very large datasets that are infeasible to load in their entirety.
How it works
Lazy data sources observe the scale domains of the view where the data
source is specified. When the domain changes as a result of an user interaction,
the data source invokes a request to fetch a new subset of the data. Lazy
sources need the visual channel
to be specified, which is used to determine the
scale to observe. For genomic data sources, the channel defaults to "x"
.
Lazy data sources are specified using the lazy
property of the data
object.
Unlike in eager data, the type
of the data source must be specified explicitly:
{
"data": {
"lazy": {
"type": "bigwig",
"url": "https://data.genomespy.app/genomes/hg38/hg38.gc5Base.bw"
}
},
...
}
Indexed FASTA¶
The "indexedFasta"
source enable fast random access to a reference sequence.
It loads the sequence as three consecutive chuncks that cover and flank the
currently visible region (domain), allowing the user to rapidly pan the view.
The chunks are provided as data objects with the following fields: chrom
(string), start
(integer), and sequence
(a string of bases).
Parameters¶
channel
-
Type:
"x"
|"y"
Which channel's scale domain to monitor.
Default value:
"x"
debounce
-
Type: number | ExprRef
Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.
Default value:
200
debounceMode
-
Type: string
The debounce mode for data updates. If set to
"domain"
, domain change events (panning and zooming) will be debounced. If set to"window"
, the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the"window"
is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.Default value:
"window"
indexUrl
-
Type: string
URL of the index file.
Default value:
url
+".fai"
. url
Required-
Type: string
URL of the fasta file.
windowSize
-
Type: number
Size of each chunk when fetching the fasta file. Data is only fetched when the length of the visible domain smaller than the window size.
Default value:
7000
Example¶
The example below shows how to specify a sequence track using an indexed FASTA
file. The sequence chunks are split into separate data objects using the
"flattenSequence"
transform, and the final
position of each nucleotide is computed using the
"formula"
transform. Please note that new data are
fetched only when the user zooms into a region smaller than the window size
(default: 7000 bp).
{
"genome": { "name": "hg38" },
"data": {
"lazy": {
"type": "indexedFasta",
"url": "https://data.genomespy.app/genomes/hg38/hg38.fa"
}
},
"transform": [
{
"type": "flattenSequence",
"field": "sequence",
"as": ["rawPos", "base"]
},
{ "type": "formula", "expr": "datum.rawPos + datum.start", "as": "pos" }
],
"encoding": {
"x": {
"chrom": "chrom",
"pos": "pos",
"type": "locus",
"scale": {
"domain": [
{ "chrom": "chr7", "pos": 20003500 },
{ "chrom": "chr7", "pos": 20003540 }
]
}
},
"color": {
"field": "base",
"type": "nominal",
"scale": {
"domain": ["A", "C", "T", "G", "a", "c", "t", "g", "N"],
"range": [
"#7BD56C",
"#FF9B9B",
"#86BBF1",
"#FFC56C",
"#7BD56C",
"#FF9B9B",
"#86BBF1",
"#FFC56C",
"#E0E0E0"
]
}
}
},
"layer": [
{
"mark": "rect"
},
{
"mark": {
"type": "text",
"size": 13,
"fitToBand": true,
"paddingX": 1.5,
"paddingY": 1,
"opacity": 0.7,
"flushX": false,
"tooltip": null
},
"encoding": {
"color": { "value": "black" },
"text": { "field": "base" }
}
}
]
}
The data source is based on GMOD's indexedfasta-js library.
BigWig¶
The "bigwig"
source enables the retrieval of dense, continuous data, such as
coverage or other signal data stored in BigWig files. It behaves similarly to
the indexed FASTA source, loading the data in chunks that cover and flank the
currently visible region. However, the window size automatically adapts to the
zoom level, and data are fetched in higher resolution when zooming in. The data
source provides data objects with the following fields: chrom
(string),
start
(integer), end
(integer), and score
(number).
Parameters¶
channel
-
Type:
"x"
|"y"
Which channel's scale domain to monitor.
Default value:
"x"
debounce
-
Type: number | ExprRef
Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.
Default value:
200
debounceMode
-
Type: string
The debounce mode for data updates. If set to
"domain"
, domain change events (panning and zooming) will be debounced. If set to"window"
, the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the"window"
is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.Default value:
"window"
pixelsPerBin
-
Type: number | ExprRef
The approximate minimum width of each data bin, in pixels.
Default value:
2
url
Required-
Type: string | ExprRef
URL of the BigWig file.
Example¶
The example below shows the GC content of the human genome in 5-base windows. When you zoom in, the resolution of the data automatically increases.
{
"genome": { "name": "hg38" },
"view": { "stroke": "lightgray" },
"data": {
"lazy": {
"type": "bigwig",
"url": "https://data.genomespy.app/genomes/hg38/hg38.gc5Base.bw"
}
},
"encoding": {
"y": {
"field": "score",
"type": "quantitative",
"scale": { "domain": [0, 100] },
"axis": { "title": "GC (%)", "grid": true, "gridDash": [2, 2] }
},
"x": { "chrom": "chrom", "pos": "start", "type": "locus" },
"x2": { "chrom": "chrom", "pos": "end" }
},
"mark": "rect"
}
The data source is based on GMOD's bbi-js library.
BigBed¶
The "bigbed"
source enables the retrieval of segmented data, such as annotated
genomic regions stored in BigBed files.
Parameters¶
channel
-
Type:
"x"
|"y"
Which channel's scale domain to monitor.
Default value:
"x"
debounce
-
Type: number | ExprRef
Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.
Default value:
200
debounceMode
-
Type: string
The debounce mode for data updates. If set to
"domain"
, domain change events (panning and zooming) will be debounced. If set to"window"
, the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the"window"
is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.Default value:
"window"
url
Required-
Type: string | ExprRef
URL of the BigBed file.
windowSize
-
Type: number | ExprRef
Size of each chunk when fetching the BigBed file. Data is only fetched when the length of the visible domain smaller than the window size.
Default value:
1000000
Example¶
The example below displays "ENCODE Candidate Cis-Regulatory Elements (cCREs) combined from all cell types" dataset for the hg38 genome.
{
"genome": { "name": "hg38" },
"view": { "stroke": "lightgray" },
"data": {
"lazy": {
"type": "bigbed",
"url": "https://data.genomespy.app/sample-data/encodeCcreCombined.hg38.bb"
}
},
"encoding": {
"x": {
"chrom": "chrom",
"pos": "chromStart",
"type": "locus",
"scale": {
"domain": [
{ "chrom": "chr7", "pos": 66600000 },
{ "chrom": "chr7", "pos": 66800000 }
]
}
},
"x2": {
"chrom": "chrom",
"pos": "chromEnd"
},
"color": {
"field": "ucscLabel",
"type": "nominal",
"scale": {
"domain": ["prom", "enhP", "enhD", "K4m3", "CTCF"],
"range": ["#FF0000", "#FFA700", "#FFCD00", "#FFAAAA", "#00B0F0"]
}
}
},
"mark": "rect"
}
The data source is based on GMOD's bbi-js library.
VCF¶
The tabix-based "vcf"
source enables the retrieval of variant data stored in
VCF files. The object format GenomeSpy uses is described in
vcf-js's
documentation.
Parameters¶
addChrPrefix
-
Type: boolean | string
Add a
chr
(boolean) or custom (string) prefix to the chromosome names in the Tabix file.Default value:
false
channel
-
Type:
"x"
|"y"
Which channel's scale domain to monitor.
Default value:
"x"
debounce
-
Type: number | ExprRef
Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.
Default value:
200
debounceMode
-
Type: string
The debounce mode for data updates. If set to
"domain"
, domain change events (panning and zooming) will be debounced. If set to"window"
, the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the"window"
is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.Default value:
"window"
indexUrl
-
Type: string
Url of the tabix index file.
Default value:
url
+".tbi"
. url
Required-
Type: string
Url of the bgzip compressed file.
windowSize
-
Type: number
Size of each chunk when fetching the Tabix file. Data is only fetched when the length of the visible domain smaller than the window size.
Default value:
30000000
Example¶
TODO
The data source is vased on GMOD's vcf-js library.
GFF3¶
The tabix-based "gff3"
source enables the retrieval of hierarchical data, such
as genomic annotations stored in GFF3 files. The object format GenomeSpy uses
is described in gff-js's
documentation. The flatten and
project transforms are useful when extracting the
child features and attributes from the hierarchical data structure. See the
example below.
Parameters¶
addChrPrefix
-
Type: boolean | string
Add a
chr
(boolean) or custom (string) prefix to the chromosome names in the Tabix file.Default value:
false
channel
-
Type:
"x"
|"y"
Which channel's scale domain to monitor.
Default value:
"x"
debounce
-
Type: number | ExprRef
Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.
Default value:
200
debounceMode
-
Type: string
The debounce mode for data updates. If set to
"domain"
, domain change events (panning and zooming) will be debounced. If set to"window"
, the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the"window"
is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.Default value:
"window"
indexUrl
-
Type: string
Url of the tabix index file.
Default value:
url
+".tbi"
. url
Required-
Type: string
Url of the bgzip compressed file.
windowSize
-
Type: number
Size of each chunk when fetching the Tabix file. Data is only fetched when the length of the visible domain smaller than the window size.
Default value:
30000000
Example¶
The example below displays the human (GRCh38.p13) GENCODE v43 annotation dataset. Please note that the example shows a maximum of ten overlapping features per locus as vertical scrolling is currently not supported properly.
{
"$schema": "https://unpkg.com/@genome-spy/core/dist/schema.json",
"genome": { "name": "hg38" },
"height": { "step": 28 },
"viewportHeight": "container",
"view": { "stroke": "lightgray" },
"data": {
"lazy": {
"type": "gff3",
"url": "https://data.genomespy.app/sample-data/gencode.v43.annotation.sorted.gff3.gz",
"windowSize": 2000000,
"debounceDomainChange": 300
}
},
"transform": [
{
"type": "flatten"
},
{
"type": "formula",
"expr": "datum.attributes.gene_name",
"as": "gene_name"
},
{
"type": "flatten",
"fields": ["child_features"]
},
{
"type": "flatten",
"fields": ["child_features"],
"as": ["child_feature"]
},
{
"type": "project",
"fields": [
"gene_name",
"child_feature.type",
"child_feature.strand",
"child_feature.seq_id",
"child_feature.start",
"child_feature.end",
"child_feature.attributes.gene_type",
"child_feature.attributes.transcript_type",
"child_feature.attributes.gene_id",
"child_feature.attributes.transcript_id",
"child_feature.attributes.transcript_name",
"child_feature.attributes.tag",
"source",
"child_feature.child_features"
],
"as": [
"gene_name",
"type",
"strand",
"seq_id",
"start",
"end",
"gene_type",
"transcript_type",
"gene_id",
"transcript_id",
"transcript_name",
"tag",
"source",
"_child_features"
]
},
{
"type": "collect",
"sort": {
"field": ["seq_id", "start", "transcript_id"]
}
},
{
"type": "pileup",
"start": "start",
"end": "end",
"as": "_lane"
}
],
"encoding": {
"x": {
"chrom": "seq_id",
"pos": "start",
"offset": 1,
"type": "locus",
"scale": {
"domain": [
{ "chrom": "chr5", "pos": 177482500 },
{ "chrom": "chr5", "pos": 177518000 }
]
}
},
"x2": {
"chrom": "seq_id",
"pos": "end"
},
"y": {
"field": "_lane",
"type": "index",
"scale": {
"zoom": false,
"reverse": true,
"domain": [0, 40],
"padding": 0.5
},
"axis": null
}
},
"layer": [
{
"name": "gencode-transcript",
"layer": [
{
"name": "gencode-tooltip-trap",
"title": "GENCODE transcript",
"mark": {
"type": "rule",
"color": "#b0b0b0",
"opacity": 0,
"size": 7
}
},
{
"name": "gencode-transcript-body",
"mark": {
"type": "rule",
"color": "#b0b0b0",
"tooltip": null
}
}
]
},
{
"name": "gencode-exons",
"transform": [
{
"type": "flatten",
"fields": ["_child_features"]
},
{
"type": "flatten",
"fields": ["_child_features"],
"as": ["child_feature"]
},
{
"type": "project",
"fields": [
"gene_name",
"_lane",
"child_feature.type",
"child_feature.seq_id",
"child_feature.start",
"child_feature.end",
"child_feature.attributes.exon_number",
"child_feature.attributes.exon_id"
],
"as": [
"gene_name",
"_lane",
"type",
"seq_id",
"start",
"end",
"exon_number",
"exon_id"
]
}
],
"layer": [
{
"title": "GENCODE exon",
"transform": [{ "type": "filter", "expr": "datum.type == 'exon'" }],
"mark": {
"type": "rect",
"minWidth": 0.5,
"minOpacity": 0.5,
"stroke": "#505050",
"fill": "#fafafa",
"strokeWidth": 1.0
}
},
{
"title": "GENCODE exon",
"transform": [
{
"type": "filter",
"expr": "datum.type != 'exon' && datum.type != 'start_codon' && datum.type != 'stop_codon'"
}
],
"mark": {
"type": "rect",
"minWidth": 0.5,
"minOpacity": 0,
"strokeWidth": 1.0,
"strokeOpacity": 0.0,
"stroke": "gray"
},
"encoding": {
"fill": {
"field": "type",
"type": "nominal",
"scale": {
"domain": ["five_prime_UTR", "CDS", "three_prime_UTR"],
"range": ["#83bcb6", "#ffbf79", "#d6a5c9"]
}
}
}
},
{
"transform": [
{
"type": "filter",
"expr": "datum.type == 'three_prime_UTR' || datum.type == 'five_prime_UTR'"
},
{
"type": "formula",
"expr": "datum.type == 'three_prime_UTR' ? \"3'\" : \"5'\"",
"as": "label"
}
],
"mark": {
"type": "text",
"color": "black",
"size": 11,
"opacity": 0.7,
"paddingX": 2,
"paddingY": 1.5,
"tooltip": null
},
"encoding": {
"text": {
"field": "label"
}
}
}
]
},
{
"name": "gencode-transcript-labels",
"transform": [
{
"type": "formula",
"expr": "(datum.strand == '-' ? '< ' : '') + datum.transcript_name + ' - ' + datum.transcript_id + (datum.strand == '+' ? ' >' : '')",
"as": "label"
}
],
"mark": {
"type": "text",
"size": 10,
"yOffset": 12,
"tooltip": null,
"color": "#505050"
},
"encoding": {
"text": {
"field": "label"
}
}
}
]
}
The data source is based on GMOD's tabix-js and gff-js libraries.
BAM¶
The "bam"
source is very much work in progress but has a low priority. It
currently exposes the reads but provides no handling for variants alleles,
CIGARs, etc. Please send a message to GitHub
Discussions if you are
interested in this feature.
Parameters¶
channel
-
Type:
"x"
|"y"
Which channel's scale domain to monitor.
Default value:
"x"
debounce
-
Type: number | ExprRef
Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.
Default value:
200
debounceMode
-
Type: string
The debounce mode for data updates. If set to
"domain"
, domain change events (panning and zooming) will be debounced. If set to"window"
, the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the"window"
is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.Default value:
"window"
indexUrl
-
Type: string
URL of the index file.
Default value:
url
+".bai"
. url
Required-
Type: string
URL of the BigBed file.
windowSize
-
Type: number
Size of each chunk when fetching the BigBed file. Data is only fetched when the length of the visible domain smaller than the window size.
Default value:
10000
Example¶
{
"genome": { "name": "hg18" },
"data": {
"lazy": {
"type": "bam",
"url": "https://data.genomespy.app/sample-data/bamExample.bam",
"windowSize": 30000
}
},
"resolve": { "scale": { "x": "shared" } },
"spacing": 5,
"vconcat": [
{
"view": { "stroke": "lightgray" },
"height": 40,
"transform": [
{
"type": "coverage",
"start": "start",
"end": "end",
"as": "coverage",
"chrom": "chrom"
}
],
"mark": "rect",
"encoding": {
"x": {
"chrom": "chrom",
"pos": "start",
"type": "locus",
"axis": null
},
"x2": { "chrom": "chrom", "pos": "end" },
"y": { "field": "coverage", "type": "quantitative" }
}
},
{
"view": { "stroke": "lightgray" },
"transform": [
{
"type": "pileup",
"start": "start",
"end": "end",
"as": "_lane"
}
],
"encoding": {
"x": {
"chrom": "chrom",
"pos": "start",
"type": "locus",
"axis": {},
"scale": {
"domain": [
{ "chrom": "chr21", "pos": 33037317 },
{ "chrom": "chr21", "pos": 33039137 }
]
}
},
"x2": {
"chrom": "chrom",
"pos": "end"
},
"y": {
"field": "_lane",
"type": "index",
"scale": {
"domain": [0, 60],
"padding": 0.3,
"reverse": true,
"zoom": false
}
},
"color": {
"field": "strand",
"type": "nominal",
"scale": {
"domain": ["+", "-"],
"range": ["crimson", "orange"]
}
}
},
"mark": "rect"
}
]
}
The data source is based on GMOD's bam-js library.
Axis ticks¶
The "axisTicks"
data source generates a set of ticks for the specified channel.
While GenomeSpy internally uses this data source for generating axis ticks, you
also have the flexibility to employ it for creating fully customized axes
according to your requirements. The data source generates data objects with
value
and label
fields.
Parameters¶
axis
-
Type: Axis
Optional axis properties
channel
Required-
Type:
"x"
|"y"
Which channel's scale domain to listen to
Example¶
The example below generates approximately three ticks for the x
axis.
{
"data": {
"lazy": {
"type": "axisTicks",
"channel": "x",
"axis": {
"tickCount": 3
}
}
},
"mark": {
"type": "text",
"size": 20,
"clip": false
},
"encoding": {
"x": {
"field": "value",
"type": "quantitative",
"scale": {
"domain": [0, 10],
"zoom": true
}
},
"text": {
"field": "label"
}
}
}
Axis genome¶
The axisGenome
data source, in fact, does not dynamically update data.
However, it provides a convenient access to the genome (chromosomes) of the
given channel, allowing creation of customized chromosome ticks or annotations.
The data source generates data objects with the following fields: name
, size
(in bp), continuousStart
(linearized coordinate), continuousEnd
, odd
(boolean), and number
(1-based index).
Parameters¶
channel
Required-
Type:
"x"
|"y"
Which channel's scale domain to use
Example¶
{
"genome": { "name": "hg38" },
"data": {
"lazy": {
"type": "axisGenome",
"channel": "x"
}
},
"encoding": {
"x": {
"field": "continuousStart",
"type": "locus"
},
"x2": {
"field": "continuousEnd"
},
"text": {
"field": "name"
}
},
"layer": [
{
"transform": [
{
"type": "filter",
"expr": "datum.odd"
}
],
"mark": {
"type": "rect",
"fill": "#f0f0f0"
}
},
{
"mark": {
"type": "text",
"size": 16,
"angle": -90,
"align": "right",
"baseline": "top",
"paddingX": 3,
"paddingY": 5,
"y": 1
}
}
]
}