Lazy Data Sources¶

Lazy data sources load data on-demand in response to user interactions. Unlike eager sources, most lazy data sources support indexing, which offers the capability to retrieve and load data partially and incrementally, as users navigate the genome. This is especially useful for very large datasets that are infeasible to load in their entirety.

How it works

Lazy data sources observe the scale domains of the view where the data source is specified. When the domain changes as a result of an user interaction, the data source invokes a request to fetch a new subset of the data. Lazy sources need the visual channel to be specified, which is used to determine the scale to observe. For genomic data sources, the channel defaults to "x".

Lazy data sources are specified using the lazy property of the data object. Unlike in eager data, the type of the data source must be specified explicitly:

Example: Specifiying a lazy data source

{
  "data": {
    "lazy": {
      "type": "bigwig",
      "url": "https://data.genomespy.app/genomes/hg38/hg38.gc5Base.bw"
    }
  },
  ...
}

Indexed FASTA¶

The "indexedFasta" source enable fast random access to a reference sequence. It loads the sequence as three consecutive chuncks that cover and flank the currently visible region (domain), allowing the user to rapidly pan the view. The chunks are provided as data objects with the following fields: chrom (string), start (integer), and sequence (a string of bases).

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

indexUrl

Type: string

URL of the index file.

Default value: url + ".fai".

url Required

Type: string

URL of the fasta file.

windowSize

Type: number

Size of each chunk when fetching the fasta file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 7000

Example¶

The example below shows how to specify a sequence track using an indexed FASTA file. The sequence chunks are split into separate data objects using the "flattenSequence" transform, and the final position of each nucleotide is computed using the "formula" transform. Please note that new data are fetched only when the user zooms into a region smaller than the window size (default: 7000 bp).

{
  "genome": { "name": "hg38" },

  "data": {
    "lazy": {
      "type": "indexedFasta",
      "url": "https://data.genomespy.app/genomes/hg38/hg38.fa"
    }
  },

  "transform": [
    {
      "type": "flattenSequence",
      "field": "sequence",
      "as": ["rawPos", "base"]
    },
    { "type": "formula", "expr": "datum.rawPos + datum.start", "as": "pos" }
  ],

  "encoding": {
    "x": {
      "chrom": "chrom",
      "pos": "pos",
      "type": "locus",
      "scale": {
        "domain": [
          { "chrom": "chr7", "pos": 20003500 },
          { "chrom": "chr7", "pos": 20003540 }
        ]
      }
    },
    "color": {
      "field": "base",
      "type": "nominal",
      "scale": {
        "domain": ["A", "C", "T", "G", "a", "c", "t", "g", "N"],
        "range": [
          "#7BD56C",
          "#FF9B9B",
          "#86BBF1",
          "#FFC56C",
          "#7BD56C",
          "#FF9B9B",
          "#86BBF1",
          "#FFC56C",
          "#E0E0E0"
        ]
      }
    }
  },
  "layer": [
    {
      "mark": "rect"
    },
    {
      "mark": {
        "type": "text",
        "size": 13,
        "fitToBand": true,
        "paddingX": 1.5,
        "paddingY": 1,
        "opacity": 0.7,
        "flushX": false,
        "tooltip": null
      },
      "encoding": {
        "color": { "value": "black" },
        "text": { "field": "base" }
      }
    }
  ]
}

The data source is based on GMOD's indexedfasta-js library.

BigWig¶

The "bigwig" source enables the retrieval of dense, continuous data, such as coverage or other signal data stored in BigWig files. It behaves similarly to the indexed FASTA source, loading the data in chunks that cover and flank the currently visible region. However, the window size automatically adapts to the zoom level, and data are fetched in higher resolution when zooming in. The data source provides data objects with the following fields: chrom (string), start (integer), end (integer), and score (number).

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

pixelsPerBin

Type: number | ExprRef

The approximate minimum width of each data bin, in pixels.

Default value: 2

url Required

Type: string | ExprRef

URL of the BigWig file.

Example¶

The example below shows the GC content of the human genome in 5-base windows. When you zoom in, the resolution of the data automatically increases.

{
  "genome": { "name": "hg38" },
  "view": { "stroke": "lightgray" },

  "data": {
    "lazy": {
      "type": "bigwig",
      "url": "https://data.genomespy.app/genomes/hg38/hg38.gc5Base.bw"
    }
  },

  "encoding": {
    "y": {
      "field": "score",
      "type": "quantitative",
      "scale": { "domain": [0, 100] },
      "axis": { "title": "GC (%)", "grid": true, "gridDash": [2, 2] }
    },
    "x": { "chrom": "chrom", "pos": "start", "type": "locus" },
    "x2": { "chrom": "chrom", "pos": "end" }
  },

  "mark": "rect"
}

The data source is based on GMOD's bbi-js library.

BigBed¶

The "bigbed" source enables the retrieval of segmented data, such as annotated genomic regions stored in BigBed files.

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

url Required

Type: string | ExprRef

URL of the BigBed file.

windowSize

Type: number | ExprRef

Size of each chunk when fetching the BigBed file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 1000000

Example¶

The example below displays "ENCODE Candidate Cis-Regulatory Elements (cCREs) combined from all cell types" dataset for the hg38 genome.

{
  "genome": { "name": "hg38" },
  "view": { "stroke": "lightgray" },

  "data": {
    "lazy": {
      "type": "bigbed",
      "url": "https://data.genomespy.app/sample-data/encodeCcreCombined.hg38.bb"
    }
  },

  "encoding": {
    "x": {
      "chrom": "chrom",
      "pos": "chromStart",
      "type": "locus",
      "scale": {
        "domain": [
          { "chrom": "chr7", "pos": 66600000 },
          { "chrom": "chr7", "pos": 66800000 }
        ]
      }
    },
    "x2": {
      "chrom": "chrom",
      "pos": "chromEnd"
    },
    "color": {
      "field": "ucscLabel",
      "type": "nominal",
      "scale": {
        "domain": ["prom", "enhP", "enhD", "K4m3", "CTCF"],
        "range": ["#FF0000", "#FFA700", "#FFCD00", "#FFAAAA", "#00B0F0"]
      }
    }
  },

  "mark": "rect"
}

The data source is based on GMOD's bbi-js library.

GFF3¶

The tabix-based "gff3" source enables the retrieval of hierarchical data, such as genomic annotations stored in GFF3 files. The object format GenomeSpy uses is described in gff-js's documentation. The flatten and project transforms are useful when extracting the child features and attributes from the hierarchical data structure. See the example below.

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

indexUrl

Type: string

Url of the tabix index file.

Default value: url + ".tbi".

url Required

Type: string

Url of the bgzip compressed file.

windowSize

Type: number

Size of each chunk when fetching the Tabix file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 30000000

Example¶

The example below displays the human (GRCh38.p13) GENCODE v43 annotation dataset. Please note that the example shows a maximum of ten overlapping features per locus as vertical scrolling is currently not supported properly.

{
  "$schema": "https://unpkg.com/@genome-spy/core/dist/schema.json",

  "genome": { "name": "hg38" },

  "height": { "step": 28 },
  "viewportHeight": "container",

  "view": { "stroke": "lightgray" },

  "data": {
    "lazy": {
      "type": "gff3",
      "url": "https://data.genomespy.app/sample-data/gencode.v43.annotation.sorted.gff3.gz",
      "windowSize": 2000000,
      "debounceDomainChange": 300
    }
  },

  "transform": [
    {
      "type": "flatten"
    },
    {
      "type": "formula",
      "expr": "datum.attributes.gene_name",
      "as": "gene_name"
    },
    {
      "type": "flatten",
      "fields": ["child_features"]
    },
    {
      "type": "flatten",
      "fields": ["child_features"],
      "as": ["child_feature"]
    },
    {
      "type": "project",
      "fields": [
        "gene_name",
        "child_feature.type",
        "child_feature.strand",
        "child_feature.seq_id",
        "child_feature.start",
        "child_feature.end",
        "child_feature.attributes.gene_type",
        "child_feature.attributes.transcript_type",
        "child_feature.attributes.gene_id",
        "child_feature.attributes.transcript_id",
        "child_feature.attributes.transcript_name",
        "child_feature.attributes.tag",
        "source",
        "child_feature.child_features"
      ],
      "as": [
        "gene_name",
        "type",
        "strand",
        "seq_id",
        "start",
        "end",
        "gene_type",
        "transcript_type",
        "gene_id",
        "transcript_id",
        "transcript_name",
        "tag",
        "source",
        "_child_features"
      ]
    },
    {
      "type": "collect",
      "sort": {
        "field": ["seq_id", "start", "transcript_id"]
      }
    },
    {
      "type": "pileup",
      "start": "start",
      "end": "end",
      "as": "_lane"
    }
  ],

  "encoding": {
    "x": {
      "chrom": "seq_id",
      "pos": "start",
      "offset": 1,
      "type": "locus",
      "scale": {
        "domain": [
          { "chrom": "chr5", "pos": 177482500 },
          { "chrom": "chr5", "pos": 177518000 }
        ]
      }
    },
    "x2": {
      "chrom": "seq_id",
      "pos": "end"
    },
    "y": {
      "field": "_lane",
      "type": "index",
      "scale": {
        "zoom": false,
        "reverse": true,
        "domain": [0, 40],
        "padding": 0.5
      },
      "axis": null
    }
  },

  "layer": [
    {
      "name": "gencode-transcript",

      "layer": [
        {
          "name": "gencode-tooltip-trap",
          "title": "GENCODE transcript",
          "mark": {
            "type": "rule",
            "color": "#b0b0b0",
            "opacity": 0,
            "size": 7
          }
        },
        {
          "name": "gencode-transcript-body",
          "mark": {
            "type": "rule",
            "color": "#b0b0b0",
            "tooltip": null
          }
        }
      ]
    },
    {
      "name": "gencode-exons",

      "transform": [
        {
          "type": "flatten",
          "fields": ["_child_features"]
        },
        {
          "type": "flatten",
          "fields": ["_child_features"],
          "as": ["child_feature"]
        },
        {
          "type": "project",
          "fields": [
            "gene_name",
            "_lane",
            "child_feature.type",
            "child_feature.seq_id",
            "child_feature.start",
            "child_feature.end",
            "child_feature.attributes.exon_number",
            "child_feature.attributes.exon_id"
          ],
          "as": [
            "gene_name",
            "_lane",
            "type",
            "seq_id",
            "start",
            "end",
            "exon_number",
            "exon_id"
          ]
        }
      ],

      "layer": [
        {
          "title": "GENCODE exon",

          "transform": [{ "type": "filter", "expr": "datum.type == 'exon'" }],

          "mark": {
            "type": "rect",
            "minWidth": 0.5,
            "minOpacity": 0.5,
            "stroke": "#505050",
            "fill": "#fafafa",
            "strokeWidth": 1.0
          }
        },
        {
          "title": "GENCODE exon",

          "transform": [
            {
              "type": "filter",
              "expr": "datum.type != 'exon' && datum.type != 'start_codon' && datum.type != 'stop_codon'"
            }
          ],

          "mark": {
            "type": "rect",
            "minWidth": 0.5,
            "minOpacity": 0,
            "strokeWidth": 1.0,
            "strokeOpacity": 0.0,
            "stroke": "gray"
          },
          "encoding": {
            "fill": {
              "field": "type",
              "type": "nominal",
              "scale": {
                "domain": ["five_prime_UTR", "CDS", "three_prime_UTR"],
                "range": ["#83bcb6", "#ffbf79", "#d6a5c9"]
              }
            }
          }
        },
        {
          "transform": [
            {
              "type": "filter",
              "expr": "datum.type == 'three_prime_UTR' || datum.type == 'five_prime_UTR'"
            },
            {
              "type": "formula",
              "expr": "datum.type == 'three_prime_UTR' ? \"3'\" : \"5'\"",
              "as": "label"
            }
          ],

          "mark": {
            "type": "text",
            "color": "black",
            "size": 11,
            "opacity": 0.7,
            "paddingX": 2,
            "paddingY": 1.5,
            "tooltip": null
          },

          "encoding": {
            "text": {
              "field": "label"
            }
          }
        }
      ]
    },
    {
      "name": "gencode-transcript-labels",

      "transform": [
        {
          "type": "formula",
          "expr": "(datum.strand == '-' ? '< ' : '') + datum.transcript_name + ' - ' + datum.transcript_id + (datum.strand == '+' ? ' >' : '')",
          "as": "label"
        }
      ],

      "mark": {
        "type": "text",
        "size": 10,
        "yOffset": 12,
        "tooltip": null,
        "color": "#505050"
      },

      "encoding": {
        "text": {
          "field": "label"
        }
      }
    }
  ]
}

The data source is based on GMOD's tabix-js and gff-js libraries.

BAM¶

The "bam" source is very much work in progress but has a low priority. It currently exposes the reads but provides no handling for variants alleles, CIGARs, etc. Please send a message to GitHub Discussions if you are interested in this feature.

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

indexUrl

Type: string

URL of the index file.

Default value: url + ".bai".

url Required

Type: string

URL of the BigBed file.

windowSize

Type: number

Size of each chunk when fetching the BigBed file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 10000

Example¶

{
  "genome": { "name": "hg18" },

  "data": {
    "lazy": {
      "type": "bam",
      "url": "https://data.genomespy.app/sample-data/bamExample.bam",
      "windowSize": 30000
    }
  },

  "resolve": { "scale": { "x": "shared" } },

  "spacing": 5,

  "vconcat": [
    {
      "view": { "stroke": "lightgray" },
      "height": 40,

      "transform": [
        {
          "type": "coverage",
          "start": "start",
          "end": "end",
          "as": "coverage",
          "chrom": "chrom"
        }
      ],
      "mark": "rect",
      "encoding": {
        "x": {
          "chrom": "chrom",
          "pos": "start",
          "type": "locus",
          "axis": null
        },
        "x2": { "chrom": "chrom", "pos": "end" },
        "y": { "field": "coverage", "type": "quantitative" }
      }
    },
    {
      "view": { "stroke": "lightgray" },

      "transform": [
        {
          "type": "pileup",
          "start": "start",
          "end": "end",
          "as": "_lane"
        }
      ],

      "encoding": {
        "x": {
          "chrom": "chrom",
          "pos": "start",
          "type": "locus",
          "axis": {},
          "scale": {
            "domain": [
              { "chrom": "chr21", "pos": 33037317 },
              { "chrom": "chr21", "pos": 33039137 }
            ]
          }
        },
        "x2": {
          "chrom": "chrom",
          "pos": "end"
        },
        "y": {
          "field": "_lane",
          "type": "index",
          "scale": {
            "domain": [0, 60],
            "padding": 0.3,
            "reverse": true,
            "zoom": false
          }
        },
        "color": {
          "field": "strand",
          "type": "nominal",
          "scale": {
            "domain": ["+", "-"],
            "range": ["crimson", "orange"]
          }
        }
      },

      "mark": "rect"
    }
  ]
}

The data source is based on GMOD's bam-js library.

Axis ticks¶

The "axisTicks" data source generates a set of ticks for the specified channel. While GenomeSpy internally uses this data source for generating axis ticks, you also have the flexibility to employ it for creating fully customized axes according to your requirements. The data source generates data objects with value and label fields.

Parameters¶

axis

Type: Axis

Optional axis properties

channel Required

Type: "x" | "y"

Which channel's scale domain to listen to

Example¶

The example below generates approximately three ticks for the x axis.

{
  "data": {
    "lazy": {
      "type": "axisTicks",
      "channel": "x",
      "axis": {
        "tickCount": 3
      }
    }
  },

  "mark": {
    "type": "text",
    "size": 20,
    "clip": false
  },

  "encoding": {
    "x": {
      "field": "value",
      "type": "quantitative",
      "scale": {
        "domain": [0, 10],
        "zoom": true
      }
    },
    "text": {
      "field": "label"
    }
  }
}

Axis genome¶

The axisGenome data source, in fact, does not dynamically update data. However, it provides a convenient access to the genome (chromosomes) of the given channel, allowing creation of customized chromosome ticks or annotations. The data source generates data objects with the following fields: name, size (in bp), continuousStart (linearized coordinate), continuousEnd, odd (boolean), and number (1-based index).

Parameters¶

channel Required

Type: "x" | "y"

Which channel's scale domain to use

Example¶

{
  "genome": { "name": "hg38" },

  "data": {
    "lazy": {
      "type": "axisGenome",
      "channel": "x"
    }
  },

  "encoding": {
    "x": {
      "field": "continuousStart",
      "type": "locus"
    },
    "x2": {
      "field": "continuousEnd"
    },
    "text": {
      "field": "name"
    }
  },

  "layer": [
    {
      "transform": [
        {
          "type": "filter",
          "expr": "datum.odd"
        }
      ],
      "mark": {
        "type": "rect",
        "fill": "#f0f0f0"
      }
    },
    {
      "mark": {
        "type": "text",
        "size": 16,
        "angle": -90,
        "align": "right",
        "baseline": "top",
        "paddingX": 3,
        "paddingY": 5,
        "y": 1
      }
    }
  ]
}