Lazy Data Sources¶

Lazy data sources load data on-demand in response to user interactions. Unlike eager sources, most lazy data sources support indexing, which offers the capability to retrieve and load data partially and incrementally, as users navigate the genome. This is especially useful for very large datasets that are infeasible to load in their entirety.

How it works

Lazy data sources observe the scale domains of the view where the data source is specified. When the domain changes as a result of an user interaction, the data source invokes a request to fetch a new subset of the data. Lazy sources need the visual channel to be specified, which is used to determine the scale to observe. For genomic data sources, the channel defaults to "x".

Lazy data sources are specified using the lazy property of the data object. Unlike in eager data, the type of the data source must be specified explicitly:

Example: Specifiying a lazy data source

{
  "data": {
    "lazy": {
      "type": "bigwig",
      "url": "https://data.genomespy.app/genomes/hg38/hg38.gc5Base.bw"
    }
  },
  ...
}

Indexed FASTA¶

The "indexedFasta" source enable fast random access to a reference sequence. It loads the sequence as three consecutive chuncks that cover and flank the currently visible region (domain), allowing the user to rapidly pan the view. The chunks are provided as data objects with the following fields: chrom (string), start (integer), and sequence (a string of bases).

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceDomainChange

Type: number | ExprRef

Debounce time for scale-domain driven data updates, in milliseconds.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

indexUrl

Type: string

URL of the index file.

Default value: url + ".fai".

url Required

Type: string

URL of the fasta file.

windowSize

Type: number

Size of each chunk when fetching the fasta file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 7000

Example¶

The visualization below shows how to specify a sequence track using an indexed FASTA file. The sequence chunks are split into separate data objects using the "flattenSequence" transform, and the final position of each nucleotide is computed using the "formula" transform. Please note that new data are fetched only when the user zooms into a region smaller than the window size (default: 7000 bp).

{
  "description": [
    "Indexed FASTA sequence track example.",
    "Data source: UCSC hg38 / GRCh38 reference FASTA (`goldenPath/hg38/bigZips/latest/hg38.fa.gz`). Terms: UCSC downloadable data files are freely reusable."
  ],

  "assembly": "hg38",

  "data": {
    "lazy": {
      "type": "indexedFasta",
      "url": "https://data.genomespy.app/genomes/hg38/hg38.fa"
    }
  },

  "transform": [
    {
      "type": "flattenSequence",
      "field": "sequence",
      "as": ["rawPos", "base"]
    },
    { "type": "formula", "expr": "datum.rawPos + datum.start", "as": "pos" }
  ],

  "encoding": {
    "x": {
      "chrom": "chrom",
      "pos": "pos",
      "type": "locus",
      "scale": {
        "domain": [
          { "chrom": "chr7", "pos": 20003500 },
          { "chrom": "chr7", "pos": 20003540 }
        ]
      }
    },
    "color": {
      "field": "base",
      "type": "nominal",
      "scale": {
        "domain": ["A", "C", "T", "G", "a", "c", "t", "g", "N"],
        "range": [
          "#7BD56C",
          "#FF9B9B",
          "#86BBF1",
          "#FFC56C",
          "#7BD56C",
          "#FF9B9B",
          "#86BBF1",
          "#FFC56C",
          "#E0E0E0"
        ]
      }
    }
  },

  "layer": [
    { "mark": "rect" },
    {
      "mark": {
        "type": "text",
        "size": 13,
        "fitToBand": true,
        "paddingX": 1.5,
        "paddingY": 1,
        "opacity": 0.7,
        "flushX": false,
        "tooltip": null
      },
      "encoding": {
        "color": { "value": "black" },
        "text": { "field": "base" }
      }
    }
  ]
}

The visualization uses a mirrored, indexed copy of UCSC's hg38 / GRCh38 reference FASTA from goldenPath/hg38/bigZips/latest/hg38.fa.gz. UCSC states that its downloadable data files and database tables are freely available for public and commercial use, subject to any upstream restrictions noted for the original assembly data.

The data source is based on GMOD's indexedfasta-js library.

BigWig¶

The "bigwig" source enables the retrieval of dense, continuous data, such as coverage or other signal data stored in BigWig files. It behaves similarly to the indexed FASTA source, loading the data in chunks that cover and flank the currently visible region. However, the window size automatically adapts to the zoom level, and data are fetched in higher resolution when zooming in. The data source provides data objects with the following fields: chrom (string), start (integer), end (integer), and score (number).

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceDomainChange

Type: number | ExprRef

Debounce time for scale-domain driven data updates, in milliseconds.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

pixelsPerBin

Type: number | ExprRef

The approximate minimum width of each data bin, in pixels.

Default value: 2

url Required

Type: string | ExprRef

URL of the BigWig file.

Example¶

The visualization below shows the GC content of the human genome in 5-base windows. When you zoom in, the resolution of the data automatically increases.

{
  "description": [
    "BigWig GC content example.",
    "Data source: UCSC hg38 GC Percent in 5-base windows (`goldenPath/hg38/bigZips/latest/hg38.gc5Base.bw`). Terms: UCSC downloadable data files are freely reusable."
  ],

  "assembly": "hg38",

  "view": { "stroke": "lightgray" },

  "data": {
    "lazy": {
      "type": "bigwig",
      "url": "https://data.genomespy.app/genomes/hg38/hg38.gc5Base.bw"
    }
  },

  "encoding": {
    "y": {
      "field": "score",
      "type": "quantitative",
      "scale": { "domain": [0, 100] },
      "axis": { "title": "GC (%)", "grid": true, "gridDash": [2, 2] }
    },
    "x": { "chrom": "chrom", "pos": "start", "type": "locus" },
    "x2": { "chrom": "chrom", "pos": "end" }
  },

  "mark": "rect"
}

The visualization uses UCSC's hg38 GC Percent in 5-base windows track, distributed as goldenPath/hg38/bigZips/latest/hg38.gc5Base.bw. UCSC states that its downloadable data files and database tables are freely available for public and commercial use.

The data source is based on GMOD's bbi-js library.

BigBed¶

The "bigbed" source enables the retrieval of segmented data, such as annotated genomic regions stored in BigBed files.

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceDomainChange

Type: number | ExprRef

Debounce time for scale-domain driven data updates, in milliseconds.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

url Required

Type: string | ExprRef

URL of the BigBed file.

windowSize

Type: number | ExprRef

Size of each chunk when fetching the BigBed file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 1000000

Example¶

The visualization below displays the "ENCODE Candidate Cis-Regulatory Elements (cCREs) combined from all cell types" dataset for the hg38 genome.

{
  "description": [
    "BigBed cCRE track example.",
    "Data source: ENCODE Registry of candidate cis-Regulatory Elements, distributed by UCSC as `gbdb/hg38/encode3/ccre/encodeCcreCombined.bb`. Terms: ENCODE data are unrestricted and UCSC downloadable data files are freely reusable."
  ],

  "assembly": "hg38",

  "view": { "stroke": "lightgray" },

  "data": {
    "lazy": {
      "type": "bigbed",
      "url": "https://data.genomespy.app/sample-data/encodeCcreCombined.hg38.bb"
    }
  },

  "encoding": {
    "x": {
      "chrom": "chrom",
      "pos": "chromStart",
      "type": "locus",
      "scale": {
        "domain": [
          { "chrom": "chr7", "pos": 66600000 },
          { "chrom": "chr7", "pos": 66800000 }
        ]
      }
    },
    "x2": { "chrom": "chrom", "pos": "chromEnd" },
    "color": {
      "field": "ucscLabel",
      "type": "nominal",
      "scale": {
        "domain": ["prom", "enhP", "enhD", "K4m3", "CTCF"],
        "range": ["#FF0000", "#FFA700", "#FFCD00", "#FFAAAA", "#00B0F0"]
      }
    }
  },

  "mark": "rect"
}

The visualization uses the ENCODE Registry of candidate cis-Regulatory Elements (cCREs), distributed by UCSC as gbdb/hg38/encode3/ccre/encodeCcreCombined.bb; see ENCODE Encyclopedia Version 2: Genomic and Transcriptomic Annotations. ENCODE data may be freely downloaded, analyzed, and published without restriction, and UCSC also states that its downloadable data files are freely available for public and commercial use.

The data source is based on GMOD's bbi-js library.

VCF¶

The tabix-based "vcf" source enables the retrieval of variant data stored in VCF files. The object format GenomeSpy uses is described in vcf-js's documentation.

Parameters¶

addChrPrefix

Type: boolean | string

Add a chr (boolean) or custom (string) prefix to the chromosome names in the Tabix file.

Default value: false

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceDomainChange

Type: number | ExprRef

Debounce time for scale-domain driven data updates, in milliseconds.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

indexUrl

Type: string

Url of the tabix index file.

Default value: url + ".tbi".

url Required

Type: string

Url of the bgzip compressed file.

windowSize

Type: number

Size of each chunk when fetching the Tabix file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 30000000

Example¶

The visualization below replicates the small-variant classification view described in NCBI's "New ClinVar graphical display" post. It places ClinVar variants by genomic position and germline classification and uses color to distinguish the classification categories.

{
  "description": [
    "ClinVar variants track example.",
    "Replicates the small-variant classification view described in NCBI's \"New ClinVar graphical display\" post.",
    "Data source: mirrored copy of the ClinVar GRCh38 VCF release from NCBI FTP (`/pub/clinvar/vcf_GRCh38/`). Terms: attribute ClinVar as the data source."
  ],

  "assembly": "hg38",

  "name": "clinvar",
  "title": {
    "text": "ClinVar Variants",
    "style": "overlay"
  },

  "height": { "step": 13 },

  "view": { "fill": "#f8f8f8" },

  "layer": [
    {
      "name": "baseline",
      "data": {
        "values": [{}]
      },
      "mark": {
        "type": "rule",
        "color": "lightgray"
      },
      "encoding": {
        "y": { "datum": "Uncertain significance", "type": "ordinal" }
      }
    },
    {
      "data": {
        "lazy": {
          "type": "vcf",
          "url": "https://data.genomespy.app/sample-data/clinvar_20241215.vcf.gz",
          "addChrPrefix": true,
          "windowSize": 1000000
        }
      },
      "transform": [
        {
          "type": "formula",
          "expr": "replace(datum.INFO['CLNSIG'], /_/g, ' ')",
          "as": "Germline classification"
        },
        {
          "type": "regexExtract",
          "field": "Germline classification",
          "regex": "^([^/]+)",
          "as": "Germline classification"
        },
        {
          "type": "formula",
          "expr": "replace(datum['Germline classification'], /^Conflicting.*/g, 'Conflicting')",
          "as": "Germline classification"
        },
        {
          "type": "filter",
          "expr": "datum['Germline classification'] == 'Pathogenic' || datum['Germline classification'] == 'Likely pathogenic' || datum['Germline classification'] == 'Uncertain significance' || datum['Germline classification'] == 'Likely benign' || datum['Germline classification'] == 'Benign' || datum['Germline classification'] == 'Conflicting'"
        }
      ],
      "layer": [
        {
          "name": "sticks",
          "mark": {
            "type": "rule",
            "tooltip": false
          },
          "encoding": {
            "y2": { "datum": "Uncertain significance" }
          }
        },
        {
          "name": "balls",
          "mark": {
            "type": "point",
            "size": 80,
            "geometricZoomBound": 13
          }
        }
      ],
      "encoding": {
        "x": {
          "chrom": "CHROM",
          "pos": "POS",
          "type": "locus",
          "offset": 1,
          "axis": null,
          "scale": {
            "domain": [
              { "chrom": "chr18", "pos": 31524101 },
              { "chrom": "chr18", "pos": 31525003 }
            ]
          }
        },
        "y": {
          "field": "Germline classification",
          "type": "ordinal",
          "scale": {
            "domain": [
              "Pathogenic",
              "Likely pathogenic",
              "Uncertain significance",
              "Likely benign",
              "Benign",
              "Conflicting"
            ]
          },
          "axis": {
            "title": "Classification"
          }
        },
        "color": {
          "field": "Germline classification",
          "type": "ordinal",
          "scale": {
            "domain": [
              "Pathogenic",
              "Likely pathogenic",
              "Uncertain significance",
              "Likely benign",
              "Benign",
              "Conflicting"
            ],
            "range": [
              "firebrick",
              "orange",
              "#f0f000",
              "#00a000",
              "darkgreen",
              "gray"
            ]
          }
        }
      }
    }
  ]
}

The visualization uses a mirrored copy of the ClinVar GRCh38 VCF release from NCBI's ClinVar FTP downloads. ClinVar asks that redistributed data be attributed to ClinVar as the data source.

The data source is based on GMOD's vcf-js library.

GFF3¶

The tabix-based "gff3" source enables the retrieval of hierarchical data, such as genomic annotations stored in GFF3 files. The object format GenomeSpy uses is described in gff-js's documentation. The flatten and project transforms are useful when extracting the child features and attributes from the hierarchical data structure. See the visualization below.

Parameters¶

addChrPrefix

Type: boolean | string

Add a chr (boolean) or custom (string) prefix to the chromosome names in the Tabix file.

Default value: false

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceDomainChange

Type: number | ExprRef

Debounce time for scale-domain driven data updates, in milliseconds.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

indexUrl

Type: string

Url of the tabix index file.

Default value: url + ".tbi".

url Required

Type: string

Url of the bgzip compressed file.

windowSize

Type: number

Size of each chunk when fetching the Tabix file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 30000000

Example¶

The visualization below displays the human (GRCh38.p13) GENCODE v43 annotation dataset. Please note that the example shows a maximum of ten overlapping features per locus as vertical scrolling is currently not supported properly.

{
  "description": [
    "GFF3 gene annotation example.",
    "Data source: sorted and bgzip-compressed copy of the GENCODE human release 43 (GRCh38.p13) comprehensive gene annotation GFF3. Terms: GENCODE project data are open access."
  ],

  "assembly": "hg38",

  "height": { "step": 28 },

  "viewportHeight": "container",

  "view": { "stroke": "lightgray" },

  "data": {
    "lazy": {
      "type": "gff3",
      "url": "https://data.genomespy.app/sample-data/gencode.v43.annotation.sorted.gff3.gz",
      "windowSize": 2000000,
      "debounceDomainChange": 300
    }
  },

  "transform": [
    { "type": "flatten" },
    {
      "type": "formula",
      "expr": "datum.attributes.gene_name",
      "as": "gene_name"
    },
    { "type": "flatten", "fields": ["child_features"] },
    {
      "type": "flatten",
      "fields": ["child_features"],
      "as": ["child_feature"]
    },
    {
      "type": "project",
      "fields": [
        "gene_name",
        "child_feature.type",
        "child_feature.strand",
        "child_feature.seq_id",
        "child_feature.start",
        "child_feature.end",
        "child_feature.attributes.gene_type",
        "child_feature.attributes.transcript_type",
        "child_feature.attributes.gene_id",
        "child_feature.attributes.transcript_id",
        "child_feature.attributes.transcript_name",
        "child_feature.attributes.tag",
        "source",
        "child_feature.child_features"
      ],
      "as": [
        "gene_name",
        "type",
        "strand",
        "seq_id",
        "start",
        "end",
        "gene_type",
        "transcript_type",
        "gene_id",
        "transcript_id",
        "transcript_name",
        "tag",
        "source",
        "_child_features"
      ]
    },
    {
      "type": "collect",
      "sort": { "field": ["seq_id", "start", "transcript_id"] }
    },
    { "type": "pileup", "start": "start", "end": "end", "as": "_lane" }
  ],

  "encoding": {
    "x": {
      "chrom": "seq_id",
      "pos": "start",
      "offset": 1,
      "type": "locus",
      "scale": {
        "domain": [
          { "chrom": "chr5", "pos": 177482500 },
          { "chrom": "chr5", "pos": 177518000 }
        ]
      }
    },
    "x2": { "chrom": "seq_id", "pos": "end" },
    "y": {
      "field": "_lane",
      "type": "index",
      "scale": {
        "zoom": false,
        "reverse": true,
        "domain": [0, 40],
        "padding": 0.5
      },
      "axis": null
    }
  },

  "layer": [
    {
      "name": "gencode-transcript",
      "layer": [
        {
          "name": "gencode-tooltip-trap",
          "title": "GENCODE transcript",
          "mark": {
            "type": "rule",
            "color": "#b0b0b0",
            "opacity": 0,
            "size": 7
          }
        },
        {
          "name": "gencode-transcript-body",
          "mark": { "type": "rule", "color": "#b0b0b0", "tooltip": null }
        }
      ]
    },
    {
      "name": "gencode-exons",
      "transform": [
        { "type": "flatten", "fields": ["_child_features"] },
        {
          "type": "flatten",
          "fields": ["_child_features"],
          "as": ["child_feature"]
        },
        {
          "type": "project",
          "fields": [
            "gene_name",
            "_lane",
            "child_feature.type",
            "child_feature.seq_id",
            "child_feature.start",
            "child_feature.end",
            "child_feature.attributes.exon_number",
            "child_feature.attributes.exon_id"
          ],
          "as": [
            "gene_name",
            "_lane",
            "type",
            "seq_id",
            "start",
            "end",
            "exon_number",
            "exon_id"
          ]
        }
      ],
      "layer": [
        {
          "title": "GENCODE exon",
          "transform": [{ "type": "filter", "expr": "datum.type == 'exon'" }],
          "mark": {
            "type": "rect",
            "minWidth": 0.5,
            "minOpacity": 0.5,
            "stroke": "#505050",
            "fill": "#fafafa",
            "strokeWidth": 1
          }
        },
        {
          "title": "GENCODE exon",
          "transform": [
            {
              "type": "filter",
              "expr": "datum.type != 'exon' && datum.type != 'start_codon' && datum.type != 'stop_codon'"
            }
          ],
          "mark": {
            "type": "rect",
            "minWidth": 0.5,
            "minOpacity": 0,
            "strokeWidth": 1,
            "strokeOpacity": 0,
            "stroke": "gray"
          },
          "encoding": {
            "fill": {
              "field": "type",
              "type": "nominal",
              "scale": {
                "domain": ["five_prime_UTR", "CDS", "three_prime_UTR"],
                "range": ["#83bcb6", "#ffbf79", "#d6a5c9"]
              }
            }
          }
        },
        {
          "transform": [
            {
              "type": "filter",
              "expr": "datum.type == 'three_prime_UTR' || datum.type == 'five_prime_UTR'"
            },
            {
              "type": "formula",
              "expr": "datum.type == 'three_prime_UTR' ? \"3'\" : \"5'\"",
              "as": "label"
            }
          ],
          "mark": {
            "type": "text",
            "color": "black",
            "size": 11,
            "opacity": 0.7,
            "paddingX": 2,
            "paddingY": 1.5,
            "tooltip": null
          },
          "encoding": {
            "text": { "field": "label" }
          }
        }
      ]
    },
    {
      "name": "gencode-transcript-labels",
      "transform": [
        {
          "type": "formula",
          "expr": "(datum.strand == '-' ? '< ' : '') + datum.transcript_name + ' - ' + datum.transcript_id + (datum.strand == '+' ? ' >' : '')",
          "as": "label"
        }
      ],
      "mark": {
        "type": "text",
        "size": 10,
        "yOffset": 12,
        "tooltip": null,
        "color": "#505050"
      },
      "encoding": {
        "text": { "field": "label" }
      }
    }
  ]
}

The visualization uses a sorted and bgzip-compressed copy of the GENCODE human release 43 (GRCh38.p13) comprehensive gene annotation GFF3. GENCODE states that all project data are open access.

The data source is based on GMOD's tabix-js and gff-js libraries.

BAM¶

The "bam" source is very much work in progress but has a low priority. It currently exposes the reads but provides no handling for variants alleles, CIGARs, etc. Please send a message to GitHub Discussions if you are interested in this feature.

Parameters¶

channel

Type: "x" | "y"

Which channel's scale domain to monitor.

Default value: "x"

debounce

Type: number | ExprRef

Debounce time for data updates, in milliseconds. Debouncing prevents excessive data updates when the user is zooming or panning around.

Default value: 200

debounceDomainChange

Type: number | ExprRef

Debounce time for scale-domain driven data updates, in milliseconds.

Default value: 200

debounceMode

Type: string

The debounce mode for data updates. If set to "domain", domain change events (panning and zooming) will be debounced. If set to "window", the data fetches initiated by the changes to the visible window (or tile) will be debounced. If your data is small, the "window" is better as it will start fetching data while the user is still panning around, resulting in a shorter perceived latency.

Default value: "window"

indexUrl

Type: string

URL of the index file.

Default value: url + ".bai".

url Required

Type: string

URL of the BigBed file.

windowSize

Type: number

Size of each chunk when fetching the BigBed file. Data is only fetched when the length of the visible domain smaller than the window size.

Default value: 10000

Example¶

{
  "description": [
    "BAM read alignment example.",
    "Data source: mirrored copy of UCSC's `bamExample.bam`, described by UCSC as 1000 Genomes NA12878 read alignments for hg18. Terms: UCSC downloadable data files are freely reusable."
  ],

  "assembly": "hg18",

  "data": {
    "lazy": {
      "type": "bam",
      "url": "https://data.genomespy.app/sample-data/bamExample.bam",
      "windowSize": 30000
    }
  },

  "resolve": {
    "scale": { "x": "shared" }
  },

  "spacing": 5,

  "vconcat": [
    {
      "height": 40,
      "transform": [
        {
          "type": "coverage",
          "start": "start",
          "end": "end",
          "as": "coverage",
          "chrom": "chrom"
        }
      ],
      "mark": "rect",
      "encoding": {
        "x": {
          "chrom": "chrom",
          "pos": "start",
          "type": "locus",
          "axis": null
        },
        "x2": { "chrom": "chrom", "pos": "end" },
        "y": { "field": "coverage", "type": "quantitative" }
      }
    },
    {
      "transform": [
        { "type": "pileup", "start": "start", "end": "end", "as": "_lane" }
      ],
      "encoding": {
        "x": {
          "chrom": "chrom",
          "pos": "start",
          "type": "locus",
          "axis": {},
          "scale": {
            "domain": [
              { "chrom": "chr21", "pos": 33037317 },
              { "chrom": "chr21", "pos": 33039137 }
            ]
          }
        },
        "x2": { "chrom": "chrom", "pos": "end" },
        "y": {
          "field": "_lane",
          "type": "index",
          "scale": {
            "domain": [0, 60],
            "padding": 0.3,
            "reverse": true,
            "zoom": false
          }
        },
        "color": {
          "field": "strand",
          "type": "nominal",
          "scale": { "domain": ["+", "-"], "range": ["crimson", "orange"] }
        }
      },
      "mark": "rect"
    }
  ],

  "config": {
    "view": { "stroke": "lightgray" }
  }
}

The visualization uses a mirrored copy of UCSC's bamExample.bam, which the UCSC BAM format documentation describes as 1000 Genomes read alignments for individual NA12878 on hg18. UCSC states that its downloadable data files are freely available for public and commercial use, and the underlying 1000 Genomes / IGSR data are also openly available.

The data source is based on GMOD's bam-js library.

Axis ticks¶

The "axisTicks" data source generates a set of ticks for the specified channel. While GenomeSpy internally uses this data source for generating axis ticks, you also have the flexibility to employ it for creating fully customized axes according to your requirements. The data source generates data objects with value and label fields.

Parameters¶

axis

Type: Axis

Optional axis properties

channel Required

Type: "x" | "y"

Which channel's scale domain to listen to

Example¶

The visualization below generates approximately three ticks for the x axis.

{
  "description": "Axis ticks data source example.",

  "data": {
    "lazy": {
      "type": "axisTicks",
      "channel": "x",
      "axis": { "tickCount": 3 }
    }
  },

  "mark": { "type": "text", "size": 20, "clip": false },

  "encoding": {
    "x": {
      "field": "value",
      "type": "quantitative",
      "scale": { "domain": [0, 10], "zoom": true }
    },
    "text": { "field": "label" }
  }
}

Axis genome¶

The axisGenome data source, in fact, does not dynamically update data. However, it provides a convenient access to the genome (chromosomes) of the given channel, allowing creation of customized chromosome ticks or annotations. The data source generates data objects with the following fields: name, size (in bp), continuousStart (linearized coordinate), continuousEnd, odd (boolean), and number (1-based index).

Parameters¶

channel Required

Type: "x" | "y"

Which channel's scale domain to use

Example¶

{
  "description": "Axis genome data source example.",

  "assembly": "hg38",

  "data": {
    "lazy": { "type": "axisGenome", "channel": "x" }
  },

  "encoding": {
    "x": { "field": "continuousStart", "type": "locus" },
    "x2": { "field": "continuousEnd" },
    "text": { "field": "name" }
  },

  "layer": [
    {
      "transform": [{ "type": "filter", "expr": "datum.odd" }],
      "mark": { "type": "rect", "fill": "#f0f0f0" }
    },
    {
      "mark": {
        "type": "text",
        "size": 16,
        "angle": -90,
        "align": "right",
        "baseline": "top",
        "paddingX": 3,
        "paddingY": 5,
        "y": 1
      }
    }
  ]
}