Configuring Metadata Sources¶
Developer Documentation
This page is intended for visualization authors configuring metadata
sources in samples.
Overview¶
Metadata sources make sample metadata configurable as explicit sources instead of one monolithic metadata table. This enables:
- eager loading for regular tabular metadata (
backend: "data") - lazy loading for large matrix-like metadata (
backend: "zarr") - per-source defaults for imported columns
Legacy compatibility
The legacy samples.data and samples.attributes configuration remains
supported for backward compatibility, but new configurations should use
samples.metadataSources.
Quick example¶
{
"samples": {
"identity": {
"data": { "url": "samples.tsv" },
"idField": "sample",
"displayNameField": "displayName"
},
"metadataSources": [
{
"id": "clinical",
"name": "Clinical",
"initialLoad": "*",
"excludeColumns": ["sample", "displayName"],
"backend": {
"backend": "data",
"data": { "url": "samples.tsv" },
"sampleIdField": "sample"
}
},
{
"id": "expression",
"name": "Expression",
"initialLoad": false,
"groupPath": "Expression",
"attributes": {
"TP53": {
"type": "quantitative",
"scale": { "scheme": "redblue", "domainMid": 0 }
}
},
"backend": {
"backend": "zarr",
"url": "data/expr.zarr"
}
}
]
}
}
In this example, identity reads the canonical sample ids and display names
from samples.tsv. The first metadata source (clinical) uses the same TSV as
an eager table source and autoloads all non-excluded columns at startup. The
second source (expression) points to a Zarr matrix, is lazy by default
(initialLoad: false), and defines quantitative styling for selected
expression columns under the Expression group.
If you omit initialLoad, backend defaults apply. For backend: "data", that
means load all columns by default; use excludeColumns to keep helper fields
such as sample and displayName out of metadata attributes.
Splitting configuration into files¶
When source definitions become long, you can keep samples.metadataSources
compact by importing each source from a separate JSON file.
Example in the main spec:
{
"samples": {
"identity": {
"data": { "url": "samples.tsv" },
"idField": "sample",
"displayNameField": "displayName"
},
"metadataSources": [
{ "import": { "url": "metadata-sources/clinical-source.json" } },
{ "import": { "url": "metadata-sources/expression-source.json" } }
]
},
...
}
Example imported source file (metadata-sources/clinical-source.json):
{
"id": "clinical",
"name": "Clinical",
"initialLoad": "*",
"excludeColumns": ["sample", "displayName"],
"backend": {
"backend": "data",
"data": { "url": "../samples.tsv" },
"sampleIdField": "sample"
}
}
Import behavior:
- each imported file must define exactly one metadata source object
- nested imports are not supported
- relative paths are resolved using GenomeSpy base-url rules
- backend URLs inside an imported source are resolved relative to that imported file
Configuring attribute types and scales¶
Attribute configuration can be attached directly to a source:
attributesapplies to specific columnsattributes[""]sets a source-level default for all imported columns
Example:
{
"id": "clinical",
"name": "Clinical",
"groupPath": "Clinical",
"initialLoad": "*",
"excludeColumns": ["sample", "displayName"],
"attributes": {
"purity": {
"type": "quantitative",
"scale": {
"domain": [0, 1],
"scheme": "yellowgreenblue"
}
},
"ploidy": {
"type": "quantitative",
"scale": {
"domain": [1.5, 6],
"scheme": "blues"
}
},
"treatment": {
"title": "Treatment",
"visible": true
}
},
"backend": {
"backend": "data",
"data": { "url": "samples.tsv" },
"sampleIdField": "sample"
}
}
In this example, purity and ploidy are configured as quantitative with
custom scales, and treatment gets a custom title. Other imported columns
without explicit defs still work: GenomeSpy infers their type from values.
When using grouped/hierarchical names (attributeGroupSeparator), attributes
can also target group nodes for shared defaults. See
Grouping and hierarchy.
If you need an explicit source-wide default (instead of inference), define
attributes[""] and then override selected columns with specific keys.
Grouping and hierarchy¶
Grouping helps when metadata has many attributes: users can collapse and expand
groups in the hierarchy, and authors can configure shared defaults once at
group level (for example type and scale) instead of repeating them for
every child column.
Metadata organization is controlled by two related properties:
groupPath: where imported columns are placedattributeGroupSeparator: how path-like column names are split into groups
Placement with groupPath¶
Without groupPath, imported columns are added at the root. With groupPath,
imported columns are prefixed under that path.
Example:
{
"id": "expression",
"groupPath": "Expression",
"backend": {
"backend": "zarr",
"url": "data/expr.zarr"
}
}
Importing column TP53 from this source creates attribute path
Expression/TP53.
Hierarchy with attributeGroupSeparator¶
attributeGroupSeparator lets grouped column names define hierarchy levels. It
also enables group-level definitions in attributes.
Suppose you have columns such as:
patientIdclinical.PFIclinical.OSsignature.HRDsignature.APOBEC
With attributeGroupSeparator: ".", the clinical.* and signature.*
columns are grouped under clinical and signature.
Inheritance rules are straightforward: child columns inherit type and scale
from the nearest parent group unless overridden by a more specific key.
visible and title apply to the group node itself (for example clinical)
rather than to all child columns.
Example configuration:
{
"id": "clinical",
"name": "Clinical",
"attributeGroupSeparator": ".",
"attributes": {
"patientId": {
"type": "nominal"
},
"clinical": {
"type": "quantitative",
"scale": { "scheme": "blues" }
},
"clinical.OS": {
"visible": false
},
"signature": {
"type": "quantitative",
"scale": { "scheme": "yelloworangered" },
"visible": false
}
},
"backend": {
"backend": "data",
"data": { "url": "samples.tsv" },
"sampleIdField": "sample"
}
}
In this configuration, clinical.PFI inherits quantitative/blues defaults from
clinical, while clinical.OS applies its own override (visible: false).
Without attributeGroupSeparator, no path splitting is applied: column names
and attributes keys are treated as flat ids.
Using both together¶
When both are set, groupPath places imported attributes under a destination
group and attributeGroupSeparator defines how grouped names are interpreted.
attributeGroupSeparator also affects how groupPath itself is parsed:
- with
attributeGroupSeparator: ".",groupPath: "Expression.RNA"becomesExpression/RNA - without
attributeGroupSeparator,groupPathis not split. The whole value is treated as one group name (for example"Expression/RNA"stays one group id).
Schema reference¶
samples entry points¶
identity-
Type: SampleIdentityDef
Optional explicit sample identity definition.
metadataSources-
Type: array
Metadata source definitions used for startup and on-demand imports.
Source order is significant for startup loading: eager startup imports are applied in declaration order.
Metadata source definitions¶
Type: MetadataSourceDef | object
attributeGroupSeparator-
Type: string
Separator used by source-side attribute names to express hierarchy.
Example: if separator is
".", columnclinical.OSis interpreted as groupclinicaland attributeOS. attributes-
Type: object
Attribute definitions keyed by attribute/column id (and optionally by group path).
Special key
""defines source-level defaults for all imported columns. Path splitting is applied only whenattributeGroupSeparatoris defined. backendRequired-
Type: DataBackendDef | ZarrBackendDef | ParquetBackendDef | ArrowBackendDef
Backend-specific source configuration.
description-
Type: string
Optional short description of what this source contains.
Can be shown in UI and can help automated agents choose the correct source.
excludeColumns-
Type: array
Column ids that must never be imported from this source.
Useful for excluding identity/helper columns such as
sampleanddisplayName. groupPath-
Type: string
Default destination group path for imported attributes.
Imported column names are placed under this path, which effectively creates (or reuses) a metadata hierarchy node.
This value is parsed as a path using
attributeGroupSeparatorwhen that separator is defined for the source. Without an explicit separator, the whole value is treated as one group name (including any/characters).Users can override this per import in the dialog.
id-
Type: string
Stable source identifier used in actions, provenance, and configuration.
Should remain stable across spec revisions if bookmarks/provenance replay must keep working.
initialLoad-
Type: boolean |
"*"| string[]Startup loading behavior.
false: do not load at startup"*": load all columns allowed by this sourcestring[]: resolve and load only the listed columns
Omitted value uses backend defaults.
name-
Type: string
Optional user-facing label shown in menus and dialogs.
If omitted, UI falls back to
id.
Backends¶
data backend¶
dataRequired-
Type: UrlData | InlineData
Eager tabular metadata source using the standard data contract.
Supports
UrlDataandInlineData. sampleIdField-
Type: string
Field name in the table that matches sample ids in the view.
Default value:
"sample"
zarr backend¶
Example with optional lookup helpers and matrix path overrides:
{
"id": "expression",
"name": "Expression (Zarr)",
"description": "Normalized expression matrix with identifier lookup.",
"initialLoad": false,
"groupPath": "Expression",
"attributes": {
"": {
"type": "quantitative",
"scale": { "scheme": "redblue", "domainMid": 0 }
}
},
"backend": {
"backend": "zarr",
"url": "data/expr.zarr",
"matrix": {
"valuesPath": "X",
"rowIdsPath": "obs_names",
"columnIdsPath": "var_names"
},
"identifiers": [
{
"name": "symbol",
"path": "var/symbol",
"primary": true,
"caseInsensitive": true
},
{
"name": "ensembl",
"path": "var/ensembl_id",
"stripVersionSuffix": true
}
]
}
}
If your store uses the default matrix paths (X, obs_names, var_names),
you can omit the entire matrix block. Identifier helpers are optional too:
if omitted, only primary column ids are used for lookup. For a minimal setup,
see the simpler Zarr example near the top of this page.
AnnData context and current scope¶
GenomeSpy currently supports an AnnData-compatible matrix subset for metadata import.
This is aimed at expression-style workflows where users import selected genes
as metadata attributes. It is not full AnnData object support. Zarr metadata
sources are primarily useful for large matrices with selective loading; for
small tabular metadata, backend: "data" is usually simpler.
AnnData compatibility checklist
- matrix values array (
valuesPath, defaultX) with shape(n_samples, n_features) - sample id array (
rowIdsPath, defaultobs_names) with lengthn_samples - feature id array (
columnIdsPath, defaultvar_names) with lengthn_features - optional lookup arrays via
identifiers(for examplevar/symbol,var/ensembl_id) with lengthn_features
Current limitations:
- sparse matrix handling is not supported in this metadata-source path
- AnnData
layersare not exposed as selectable alternatives toX obs,var,obsm,varm,obsp,varp,unsare not ingested directly
Implementation note: browser-side Zarr access is done with
zarrita, fetching only requested
arrays/chunks over HTTP.
Zarr backend schema¶
identifiers-
Type: array
Optional identifier arrays used to resolve user queries to columns.
If omitted, only primary column ids are used for lookup.
matrix-
Type: ZarrMatrixLayoutDef
Optional path overrides for the expression-style matrix layout.
urlRequired-
Type: string
URL to the root of the Zarr store.
Zarr layout details¶
These definitions describe where matrix content lives inside the Zarr store. Use these path overrides for expression-style sample-by-feature arrays.
columnIdsPath-
Type: string
Path to matrix column identifiers.
Default value:
"var_names" rowIdsPath-
Type: string
Path to matrix row identifiers (sample ids).
Default value:
"obs_names" valuesPath-
Type: string
Path to matrix values (sample rows x metadata columns).
Default value:
"X"
Zarr identifier helpers¶
These optional definitions improve column lookup from user-entered terms. Use
identifiers for aligned identifier arrays (for example symbol and Ensembl).
caseInsensitive-
Type: boolean
Enables case-insensitive matching for this identifier field.
nameRequired-
Type: string
Logical identifier name shown in UI and diagnostics.
Example values:
"symbol","ensembl","entrez". pathRequired-
Type: string
Backend path that provides identifier values aligned to matrix columns.
The array length must equal the number of columns in the matrix.
primary-
Type: boolean
Marks this identifier as the primary, canonical identifier.
stripVersionSuffix-
Type: boolean
Remove version suffixes during matching (for example, ENSG...
.12).Useful for identifiers such as Ensembl ids that may contain version suffixes in some datasets but not in user queries.
Background references¶
- AnnData docs: https://anndata.readthedocs.io/
- Zarr specs: https://zarr-specs.readthedocs.io/en/latest/
- zarrita: https://github.com/manzt/zarrita.js