Visualizing a SEG file

These examples visualize segmented data with two different visual encodings.

The example data consists of segmentations for two samples. Each segment has a chromosome, intra-chromosomal start and end coordinates, and two quantitative values:

'ID chrom loc.start loc.end num.mark seg.mean
GenomeWideSNP_416532 1 51598 76187 14 -0.7116
GenomeWideSNP_416532 1 76204 16022502 8510 -0.029
GenomeWideSNP_416532 1 16026084 16026512 6 -2.0424
GenomeWideSNP_416532 1 16026788 17063449 424 -0.1024
... ... ... ... ... ...

Data source: https://software.broadinstitute.org/software/igv/SEG

A simple example

The following example uses a conventional heatmap (rect mark) to display the segments. The color scale has been configured to match the Integrative Genomics Viewer.

{
  "genome": { "name": "hg18" },

  "concat": [
    { "import": { "name": "cytobands" } },

    {
      "data": {
        "url": "example.seg",
        "format": { "type": "tsv" }
      },

      "mark": "rect",

      "encoding": {
        "x": { "chrom": "chrom", "pos": "loc\\.start", "type": "quantitative" },
        "x2": { "chrom": "chrom", "pos": "loc\\.end" },
        "y": { "field": "\\'ID", "type": "nominal" },
        "color": {
          "field": "seg\\.mean",
          "type": "quantitative",
          "scale": {
            "domain": [-1.5, 1.5],
            "range": ["blue", "white", "red"]
          }
        }
      }
    },

    { "import": { "name": "genomeAxis" } }
  ]
}

An advanced example: emphasizing focal segments

The data contains focal segments that are short and barely visible. Although zooming reveals them, finding them all requires a lot of effort. The following example uses an alternative visual encoding for the data, emphasizing the focal segments.

The quantitative value is encoded as position (height) instead of color. Focal segments are extracted from the data using the filter transform and displayed using point mark.

{
  "genome": {
    "name": "hg18"
  },

  "concat": [
    {
      "import": { "name": "cytobands" }
    },

    {
      "name": "layers",
      "data": {
        "url": "example.seg",
        "format": { "type": "tsv" }
      },
      "encoding": {
        "sample": { "field": "\\'ID", "type": "nominal" },
        "color": {
          "field": "seg\\.mean",
          "type": "quantitative",
          "scale": {
            "type": "threshold",
            "domain": [0],
            "range": ["#2277ff", "#dd4422"]
          }
        },
        "y": {
          "field": "seg\\.mean",
          "type": "quantitative"
        }
      },
      "layer": [
        {
          "mark": {
            "type": "rect",
            "minWidth": 1,
            "minOpacity": 0.2
          },
          "encoding": {
            "x": {
              "chrom": "chrom",
              "pos": "loc\\.start",
              "type": "quantitative"
            },
            "x2": { "chrom": "chrom", "pos": "loc\\.end" }
          }
        },
        {
          "transform": [
            {
              "type": "filter",
              "expr": "datum['loc.end'] - datum['loc.start'] < 8000"
            },
            {
              "type": "formula",
              "expr": "(datum['loc.start'] + datum['loc.end']) / 2",
              "as": "centre"
            }
          ],
          "mark": "point",
          "encoding": {
            "x": {
              "chrom": "chrom",
              "pos": "centre",
              "type": "quantitative"
            },
            "y": {
              "field": "seg\\.mean",
              "type": "quantitative"
            },
            "size": {
              "value": 40
            }
          }
        }
      ]
    },
    {
      "import": { "name": "genomeAxis" }
    }
  ]
}