Skip to content

Flatten Compressed Exons

The "flattenCompressedExons" transform expands a compact exon encoding into one datum per exon.

It reads the transcript start coordinate from start and parses exons as alternating segment lengths in gap, exon, gap, exon, ... order. The first gap is typically 0, so the first exon begins at start.

Each output row is a clone of the input datum with added exon interval fields. By default, these are exonStart and exonEnd.

This transform is mainly intended to be used with an optimized gene annotation track. Read more at Annotation Tracks notebook.

Parameters

as

Type: array

Field names for the flattened exons.

Default: ["exonStart", "exonEnd"]

exons

Type: string (field name)

The field containing the exons.

Default: "exons"

start

Type: string (field name)

Start coordinate of the gene body.

Default: "start"

Example

Given the following input data:

transcript start exons
TX1 1000 0,80,40,60,30,50

... and the transform:

{
  "type": "flattenCompressedExons"
}

The exons string is interpreted as:

Segment Meaning Result
0 Gap from start to exon 1 exon 1 starts at 1000
80 Length of exon 1 exon 1 ends at 1100
40 Gap to exon 2 exon 2 starts at 1140
60 Length of exon 2 exon 2 ends at 1200
30 Gap to exon 3 exon 3 starts at 1230
50 Length of exon 3 exon 3 ends at 1280

Flattened data is produced:

transcript start exons exonStart exonEnd
TX1 1000 0,80,40,60,30,50 1000 1100
TX1 1000 0,80,40,60,30,50 1140 1200
TX1 1000 0,80,40,60,30,50 1230 1280