Skip to content

Visualizing Sample Collections

Developer Documentation

This page is intended for users who develop tailored visualizations using the GenomeSpy app.

Getting started

You can use the following HTML template to create a web page for your visualization. The template loads the app from a content delivery network and the visualization specification from a separate spec.json file placed in the same directory. See the getting started page for more information.

<!DOCTYPE html>
<html>
  <head>
    <title>GenomeSpy</title>
    <link
      rel="stylesheet"
      type="text/css"
      href="https://cdn.jsdelivr.net/npm/@genome-spy/app@0.51.x/dist/style.css"
    />
  </head>
  <body>
    <script
      type="text/javascript"
      src="https://cdn.jsdelivr.net/npm/@genome-spy/app@0.51.x"
    ></script>

    <script>
      genomeSpyApp.embed(document.body, "spec.json", {
        // Show the dataflow inspector button in the toolbar (default: true)
        showInspectorButton: true,
      });
    </script>
  </body>
</html>

For a complete example, check the website-examples repository on GitHub.

Specifying a Sample View

The GenomeSpy app extends the core library with a new view composition operator that allows visualization of multiple samples. In this context, a sample means a set of data objects representing an organism, a piece of tissue, a cell line, a single cell, etc. Each sample gets its own track in the visualization, and the behavior resembles the facet operator of Vega-Lite. However, there are subtle differences in the behavior.

A sample view is defined by the samples and spec properties. To assign a track for a data object, define a sample-identifier field using the sample channel. More complex visualizations can be created using the layer operator. Each composed view may have a different data source, enabling concurrent visualization of multiple data types. For instance, the bottom layer could display segmented copy-number data, while the top layer might show single-nucleotide variants.

{
  "samples": {
    // Optional sample identifiers and metadata
    ...
  },
  "spec": {
    // A single or layer specification
    ...,
    "encoding": {
      ...,
      // The sample channel identifies the track
      "sample": {
        "field": "sampleId"
      }
    }
  }
}

Y axis ticks

The Y axis ticks are not available in sample views at the moment. Will be fixed at a later time. However, they would not be particularly practical with high number of samples.

But we have Band scale?

Superficially similar results can be achieved by using the "band" scale on the y channel. However, you can not adjust the intra-band y-position, as the y channel is already reserved for assigning a band for a datum. On the other hand, with the band scale, the graphical marks can span multiple bands. You could, for example, draw lines between the bands.

Implicit sample identifiers

By default, the identifiers of the samples are extracted from the data, and each sample gets its own track.

Defining samples (minimal example)

For most sample-collection visualizations, define:

  1. sample identity (identity)
  2. one or more metadata sources (metadataSources)
Minimal sample definition
{
  "samples": {
    "identity": {
      "data": { "url": "samples.tsv" },
      "idField": "sample",
      "displayNameField": "displayName"
    },
    "metadataSources": [
      {
        "id": "clinical",
        "name": "Clinical metadata",
        "initialLoad": "*",
        "excludeColumns": ["sample", "displayName"],
        "backend": {
          "backend": "data",
          "data": { "url": "samples.tsv" },
          "sampleIdField": "sample"
        }
      }
    ]
  },
  ...
}

This configuration reads sample ids and display names from samples.tsv, then loads metadata columns from the same file at startup. The sample and displayName columns are excluded from metadata import, so only actual metadata attributes are shown in the metadata panel.

For advanced metadata-source configuration (for example, lazy Zarr-backed imports), see Configuring Metadata Sources.

Adjusting font sizes, etc.

The samples object can also be used to adjust the font sizes, etc. of the metadata attributes. For example, to increase the font sizes of the sample and attribute labels, use the following configuration:

Adjusting font sizes
{
  "samples": {
    ...,
    "labelFontSize": 12,
    "attributeLabelFontSize": 10
  },
  ...
}

The following properties allow for fine-grained control of the font styles:

labelFont

Type: string

The font typeface. GenomeSpy uses SDF versions of Google Fonts. Check their availability at the A-Frame Fonts repository. System fonts are not supported.

Default value: "Lato"

labelFontSize

Type: number

The font size in pixels.

Default value: 11

labelFontWeight

Type: number | "thin" | "light" | "regular" | "normal" | "medium" | "bold" | "black"

The font weight. The following strings and numbers are valid values: "thin" (100), "light" (300), "regular" (400), "normal" (400), "medium" (500), "bold" (700), "black" (900)

Default value: "regular"

labelFontStyle

Type: "normal" | "italic"

The font style. Valid values: "normal" and "italic".

Default value: "normal"

labelAlign

Type: "left" | "center" | "right"

The horizontal alignment of the text. One of "left", "center", or "right".

Default value: "left"

attributeLabelFont

Type: string

The font typeface. GenomeSpy uses SDF versions of Google Fonts. Check their availability at the A-Frame Fonts repository. System fonts are not supported.

Default value: "Lato"

attributeLabelFontSize

Type: number

The font size in pixels.

Default value: 11

attributeLabelFontWeight

Type: number | "thin" | "light" | "regular" | "normal" | "medium" | "bold" | "black"

The font weight. The following strings and numbers are valid values: "thin" (100), "light" (300), "regular" (400), "normal" (400), "medium" (500), "bold" (700), "black" (900)

Default value: "regular"

attributeLabelFontStyle

Type: "normal" | "italic"

The font style. Valid values: "normal" and "italic".

Default value: "normal"

In addition, the following properties are supported:

labelTitleText

Type: string

Text in the label title

Default: "Sample name"

labelLength

Type: number

How much space in pixels to reserve for the labels.

Default: 140

attributeSize

Type: number

Default size (width) of the metadata attribute columns. Can be configured per attribute using the attributes property.

Default value: 10

attributeLabelAngle

Type: number

Angle to be added to the default label angle (-90).

Default value: 0

attributeSpacing

Type: number

Spacing between attribute columns in pixels.

Default value: 1

Handling variable sample heights

The height of a single sample depend on the number of samples and the height of the sample view. Moreover, the end user can toggle between a bird's eye view and a closeup view making the height very dynamic.

To adapt the maximum size of "point" marks to the height of the samples, you need to specify a dynamic scale range for the size channel. The following example demonstrates how to use expressions and the height parameter to adjust the point size:

Dynamic point sizes
"encoding": {
  "size": {
    "field": "VAF",
    "type": "quantitative",
    "scale": {
      "domain": [0, 1],
      "range": [
        { "expr": "0" },
        { "expr": "pow(clamp(height * 0.65, 2, 18), 2)" }
      ]
    }
  },
  ...
}

In this example, the height parameter, provided by the sample view, contains the height of a single sample. By multiplying it with 0.65, the points get some padding at the top and bottom. To prevent the points from becoming too small or excessively large, the clamp function is used to limit the point's diameter to a minimum of 2 and a maximum of 18 pixels. As the size channel encodes the area, not the diameter of the points, the pow function is used to square the value. The technique shown here is used in the PARPiCL example.

Aggregation

TODO

Bookmarking

With the GenomeSpy app, users can save the current visualization state, including scale domains and view visibilities, as bookmarks. These bookmarks are stored in the IndexedDB of the user's web browser. Each database is unique to an origin, which typically refers to the hostname and domain of the web server hosting the visualization. Since the server may host multiple visualizations, each visualization must have a unique ID assigned to it. To enable bookmarking, simply add the specId property with an arbitrary but unique string value to the top-level view. Example:

{
  "specId": "My example visualization",

  "vconcat": { ... },
  ...
}

Pre-defined bookmarks and bookmark tour

You may want to provide users with a few pre-defined bookmarks that showcase interesting findings from the data. Since bookmarks support Markdown-formatted notes, you can also explain the implications of the findings and present essential background information.

The remote bookmarks feature allows for storing bookmarks in a JSON file on a web server and provides them to users through the bookmark menu. In addition, you can optionally enable the tour function, which automatically opens the first bookmark in the file and allows the user navigate the tour using previous/next buttons.

Enabling remote bookmarks

View specification
{
  "bookmarks": {
    "remote": {
      "url": "tour.json",
      "tour": true
    }
  },

  "vconcat": { ... },
  ...
}

The remote object accepts the following properties:

url Required

Type: string

URL to the remote bookmark file.

initialBookmark

Type: string

Name of the bookmark that should be loaded as the initial state. The bookmark description dialog is shown only if the tour property is set to true.

tour

Type: boolean

Should the user be shown a tour of the remote bookmarks when the visualization is launched? If the initialBookmark property is not defined, the tour starts from the first bookmark.

Default value: false

afterTourBookmark

Type: string

Name of the bookmark that should be loaded when the user ends the tour. If null, the dialog will be closed and the current state is retained. If undefined, the default state without any performed actions will be loaded.

The bookmark file

The remote bookmark file consists of an array of bookmark objects. The easiest way to create such bookmark objects is to create a bookmark in the app and choose Share from the submenu () of the bookmark item. The sharing dialog provides the bookmark in a URL-encoded format and as a JSON object. Just copy-paste the JSON object into the bookmark file to make it available to all users. A simplified example:

Bookmark file (tour.json)
[
  {
    "name": "First bookmark",
    "actions": [ ... ],
    ...
  },
  {
    "name": "Second bookmark",
    "actions": [ ... ],
    ...
  }
]

Providing the user with an initial state

If you want to provide the user with an initial state comprising specific actions performed on the samples, a particular visible genomic region, etc., you can create a bookmark with the desired settings and set the initialBookmark property to the bookmark's name. See the documentation above for details.

Toggleable View Visibility

When working with a complex visualization that includes multiple tracks and extensive metadata, it may not always be necessary to display all views simultaneously. The GenomeSpy app offers users the ability to toggle the visibility of nodes within the view hierarchy. This visibility state is also included in shareable links and bookmarks, allowing users to easily access their preferred configurations.

Views have two properties for controlling the visibility:

visible

Type: boolean

The default visibility of the view. An invisible view is removed from the layout and not rendered. For context, see toggleable view visibility.

Default: true

configurableVisibility

Type: boolean | AppVisibilityGroupSpec

Is the visibility configurable from the GenomeSpy App view visibility menu.

Configurability requires an explicit view name that is unique in its import scope.

Set to an object with group to make configurable views mutually exclusive in the menu (radio buttons) within the same import scope.

Default value: false for children of layer, true for others

Use object-form configurableVisibility to make views mutually exclusive in the menu. Views that share the same group in the same import scope are shown as radio buttons:

{
  "name": "rawCoverage",
  "configurableVisibility": { "group": "coverageMode" },
  ...
}

The location/search field in the toolbar allows users to quickly navigate to features in the data. To make features searchable, use the search channel on marks that represent the searchable data objects.

search accepts either a single field definition or an array of field definitions. When multiple fields are provided, a datum matches if any of the fields matches the entered term (case-insensitive exact match).

Examples:

{
  ...,
  "mark": "rect",
  "encoding": {
    "search": {
      "field": "geneSymbol"
    },
    ...,
  },
  ...
}
{
  ...,
  "mark": "rect",
  "encoding": {
    "search": [
      { "field": "geneSymbol" },
      { "field": "geneId" },
      { "field": "alias" }
    ],
    ...,
  },
  ...
}

A practical example

Work in progress

This part of the documentation is still under construction. For a live example, check the PARPiCL visualization, which is also available for interactive exploration