ASCAT Algorithm in GenomeSpy¶

This page visualizes the core ASCAT fit for sample S96 using the same simulated example data as the companion ASCAT Copy-Number Segmentation page and the method from Allele-specific copy number analysis of tumors. It takes ASCAT's segmented logRMean and bafMean values and applies the purity/ploidy equations directly in the spec. The sliders expose the aberrant cell fraction (rho) and average ploidy (psi), the top panel shows a segment-length-weighted goodness-of-fit proxy, and the middle panel compares the raw major/minor copy-number estimates with the rounded integers. The lower panels show the LogR and B-allele frequency tracks that drive the fit.

Because the input is already segmented, this example does not rerun ASCAT's ASPCF preprocessing or search over candidate parameter grids. Instead, it shows how the current rho and psi values propagate through the equations, how close the raw estimates are to integer copy numbers, and how the fit score changes when the parameters move.

{
  "description": [
    "ASCAT algorithm in GenomeSpy",
    "Loo P Van, Nordgard SH, Lingjærde OC, et al.",
    "Allele-specific copy number analysis of tumors",
    "Proc Natl Acad Sci. 2010;107(39):16910-16915. doi:10.1073/pnas.1009843107"
  ],

  "assembly": "hg18",

  "data": {
    "url": "https://data.genomespy.app/sample-data/ASCAT/segments_S96.tsv"
  },

  "params": [{ "name": "minLength", "value": 1 }],

  "resolve": { "axis": { "x": "shared" } },

  "encoding": {
    "x": {
      "chrom": "chr",
      "pos": "startpos",
      "type": "locus",
      "scale": { "type": "locus" },
      "axis": { "title": null }
    },
    "x2": {
      "chrom": "chr",
      "pos": "endpos",
      "offset": 1
    }
  },

  "vconcat": [
    {
      "params": [
        {
          "name": "rho",
          "value": 0.56,
          "bind": {
            "input": "range",
            "min": 0.01,
            "max": 1.0,
            "step": 0.01,
            "description": "Aberrant cell fraction (Tumor purity)"
          }
        },
        {
          "name": "psi",
          "value": 2.75,
          "bind": {
            "input": "range",
            "min": 0.5,
            "max": 7.0,
            "step": 0.05,
            "description": "Average ploidy"
          }
        }
      ],

      "transform": [
        {
          "type": "collect"
        },
        {
          "type": "formula",
          "expr": "(rho - 1 + pow(2, datum.logRMean) * (1 - datum.bafMean) * (2 * (1 - rho) + rho * psi)) / rho",
          "as": "aRaw"
        },
        {
          "type": "formula",
          "expr": "(rho - 1 + pow(2, datum.logRMean) * datum.bafMean * (2 * (1 - rho) + rho * psi)) / rho",
          "as": "bRaw"
        },
        {
          "type": "formula",
          "expr": "max(0, round(datum.aRaw))",
          "as": "nMajor"
        },
        {
          "type": "formula",
          "expr": "max(0, round(datum.bRaw))",
          "as": "nMinor"
        },
        {
          "type": "formula",
          "expr": "abs(datum.aRaw - datum.nMajor)",
          "as": "aError"
        },
        {
          "type": "formula",
          "expr": "abs(datum.bRaw - datum.nMinor)",
          "as": "bError"
        },
        {
          "type": "formula",
          "expr": "log((2 * (1 - rho) + rho * (datum.nMajor + datum.nMinor)) / (2 * (1 - rho) + rho * psi)) / LN2",
          "as": "logRMean_ASCAT"
        },
        {
          "type": "formula",
          "expr": "(1 - rho + rho * datum.nMinor) / (2 - 2 * rho + rho * (datum.nMajor + datum.nMinor))",
          "as": "bafMean_ASCAT"
        },
        {
          "type": "identifier"
        }
      ],

      "resolve": { "axis": { "x": "shared" } },

      "vconcat": [
        {
          "name": "goodnessOfFit",

          "title": {
            "text": "Goodness of fit",
            "orient": "none"
          },

          "height": 24,

          "resolve": {
            "scale": { "x": "excluded" },
            "axis": { "x": "excluded" }
          },

          "transform": [
            {
              "type": "project",
              "fields": ["startpos", "endpos", "aError", "bError"]
            },
            {
              "type": "formula",
              "expr": "datum.endpos - datum.startpos",
              "as": "segmentLength"
            },
            {
              "type": "formula",
              "expr": "min(datum.aError * datum.aError, 0.25) * datum.segmentLength",
              "as": "aErrorSquaredWeighted"
            },
            {
              "type": "formula",
              "expr": "min(datum.bError * datum.bError, 0.25) * datum.segmentLength",
              "as": "bErrorSquaredWeighted"
            },
            {
              "type": "aggregate",
              "fields": [
                "segmentLength",
                "aErrorSquaredWeighted",
                "bErrorSquaredWeighted"
              ],
              "ops": ["sum", "sum", "sum"],
              "as": ["totalLength", "aSum", "bSum"]
            },
            {
              "type": "formula",
              "expr": "100 - (datum.aSum + datum.bSum) / (datum.totalLength * 0.5) * 100",
              "as": "goodnessOfFit"
            }
          ],

          "layer": [
            {
              "name": "bar",

              "mark": {
                "type": "rect",
                "color": "#a0e7e5",
                "tooltip": null
              },

              "encoding": {
                "x": {
                  "field": "goodnessOfFit",
                  "type": "quantitative",
                  "scale": {
                    "domain": [0, 100],
                    "clamp": true
                  }
                },

                "x2": { "datum": 0 }
              }
            },
            {
              "name": "text",

              "mark": {
                "type": "text",
                "size": 20,
                "tooltip": null
              },

              "encoding": {
                "x": { "value": 0.5 },
                "x2": null,
                "text": {
                  "field": "goodnessOfFit",
                  "format": ".2f"
                }
              }
            }
          ]
        },
        {
          "name": "roundedAndDifference",
          "title": {
            "text": "Copy numbers rounded to integers + difference to raw values",
            "style": "overlay"
          },

          "height": { "grow": 2 },

          "layer": [
            {
              "title": "mismatchMinor",
              "mark": {
                "type": "rect",
                "minWidth": { "expr": "minLength" },
                "fillOpacity": 0.15,
                "strokeOpacity": 1,
                "strokeWidth": 1,
                "fill": "gray",
                "stroke": "#88d27a"
              },
              "encoding": {
                "y": {
                  "field": "bRaw",
                  "type": "quantitative"
                },
                "y2": {
                  "field": "nMinor"
                },
                "strokeOpacity": {
                  "field": "bError",
                  "type": "quantitative",
                  "scale": {
                    "type": "pow",
                    "base": 2,
                    "domain": [0, 0.5],
                    "range": [0, 0.8]
                  }
                }
              }
            },
            {
              "name": "mismatchMajor",
              "title": "mismatchMajor",
              "mark": {
                "type": "rect",
                "minWidth": { "expr": "minLength" },
                "fillOpacity": 0.15,
                "strokeOpacity": 1,
                "strokeWidth": 1,
                "fill": "gray",
                "stroke": "#f06850"
              },
              "encoding": {
                "y": {
                  "field": "aRaw",
                  "type": "quantitative"
                },
                "y2": {
                  "field": "nMajor"
                },
                "strokeOpacity": {
                  "field": "aError",
                  "type": "quantitative",
                  "scale": {
                    "type": "pow",
                    "base": 2,
                    "domain": [0, 0.5],
                    "range": [0, 0.8]
                  }
                }
              }
            },
            {
              "title": "nMinor",
              "mark": {
                "type": "rule",
                "minLength": { "expr": "minLength" },
                "yOffset": -3.0
              },
              "encoding": {
                "y": {
                  "field": "nMinor",
                  "type": "quantitative",
                  "scale": {
                    "domain": [0, 6],
                    "padding": 0.04,
                    "clamp": true
                  },
                  "axis": {
                    "tickMinStep": 1.0
                  }
                },
                "size": { "value": 5 },
                "color": { "value": "#88d27a" }
              }
            },
            {
              "title": "nMajor",
              "mark": {
                "type": "rule",
                "minLength": { "expr": "minLength" },
                "yOffset": 3.0
              },
              "encoding": {
                "y": {
                  "field": "nMajor",
                  "type": "quantitative",
                  "scale": {
                    "domain": [0, 6]
                  }
                },
                "size": { "value": 5 },
                "color": {
                  "field": "nMajor",
                  "type": "quantitative",
                  "scale": {
                    "domain": [0, 6, 16],
                    "range": ["#f06850", "#f06850", "#5F0F0F"]
                  }
                }
              }
            }
          ]
        }
      ]
    },

    {
      "name": "logRTrack",

      "layer": [
        {
          "data": {
            "url": "https://data.genomespy.app/sample-data/ASCAT/raw_S96.tsv"
          },

          "title": "Single probe",

          "mark": {
            "type": "point",
            "size": { "expr": "min(10 * pow(zoomLevel, 1.5), 200)" }
          },

          "encoding": {
            "x": {
              "chrom": "chr",
              "pos": "pos",
              "type": "locus"
            },
            "y": { "field": "logR", "type": "quantitative", "title": null },
            "color": { "value": "#7090c0" },
            "opacity": { "value": 0.25 },
            "strokeWidth": { "value": 0 }
          }
        },
        {
          "title": "Mean LogR",
          "mark": {
            "type": "rule",
            "minLength": { "expr": "minLength" }
          },
          "encoding": {
            "y": {
              "field": "logRMean",
              "type": "quantitative",
              "title": "LogR"
            },
            "size": { "value": 3 },
            "color": { "value": "black" }
          }
        }
      ]
    },

    {
      "name": "bafTrack",

      "layer": [
        {
          "data": {
            "url": "https://data.genomespy.app/sample-data/ASCAT/raw_S96.tsv"
          },

          "transform": [{ "type": "filter", "expr": "datum.baf !== null" }],

          "title": "Single probe",

          "mark": {
            "type": "point",
            "size": { "expr": "min(10 * pow(zoomLevel, 1.5), 200)" }
          },

          "encoding": {
            "x": {
              "chrom": "chr",
              "pos": "pos",
              "type": "locus"
            },
            "y": { "field": "baf", "type": "quantitative", "title": null },
            "color": { "value": "#7090c0" },
            "opacity": { "value": 0.3 },
            "strokeWidth": { "value": 0 }
          }
        },
        {
          "title": "Mean BAF",
          "mark": {
            "type": "rule",
            "minLength": { "expr": "minLength" }
          },
          "encoding": {
            "y": {
              "field": "bafMean",
              "type": "quantitative",
              "scale": { "domain": [0, 1] },
              "title": "B-allele frequency"
            },
            "size": { "value": 3 },
            "color": { "value": "black" }
          }
        },
        {
          "title": "Mean BAF",
          "mark": {
            "type": "rule",
            "minLength": 3.0
          },
          "encoding": {
            "y": {
              "expr": "1 - datum.bafMean",
              "type": "quantitative",
              "title": null
            },
            "size": { "value": 3 },
            "color": { "value": "black" }
          }
        }
      ]
    }
  ],

  "background": "#fafafa",

  "config": {
    "axisX": {
      "grid": false,
      "chromGrid": true,
      "orient": "bottom"
    },
    "axisY": {
      "grid": true,
      "gridColor": "#f8f8f8"
    },
    "view": {
      "fill": "white",
      "stroke": "#c8c8c8",
      "shadowBlur": 8,
      "shadowColor": "black",
      "shadowOpacity": 0.1
    }
  }
}

The visualization follows the ASCAT method described in Allele-specific copy number analysis of tumors by Loo et al. It is a GenomeSpy visualization of the core purity/ploidy fit, shown here for the simulated S96 example data.

Equations¶

The spec implements the key ASCAT copy-number equations from the supplement. The raw major and minor copy-number estimates are given by the supplementary Eqs. S7 and S8, derived from logRMean, bafMean, rho, and psi:

aRaw = (rho - 1 + 2^logRMean * (1 - bafMean) * (2 * (1 - rho) + rho * psi)) / rho
bRaw = (rho - 1 + 2^logRMean * bafMean * (2 * (1 - rho) + rho * psi)) / rho

It then rounds those values to nonnegative integers with round() and max(0, ...), then uses the integer calls to predict the corresponding logRMean_ASCAT and bafMean_ASCAT values. That forward calculation makes it possible to compare the rounded copy-number state back to the observed segment means and derive the goodness-of-fit proxy from the same distance-to-integer idea as the paper's Eq. 3, but with segment-length weighting because the example works on segmented input.

GenomeSpy Features¶

This example combines several GenomeSpy capabilities in one spec:

Parameters bind rho and psi to sliders, so the data update interactively.
formula transforms implement the ASCAT equations for raw and rounded copy numbers, error terms, and the derived logRMean and bafMean values.
aggregate computes the goodness-of-fit score from the segment-wise rounding error.
vconcat stacks the summary, copy-number, LogR, and BAF views while sharing the same genomic x-axis.
layer overlays the raw values, rounded values, and mismatch bands in the same panel.
The locus scale keeps the genomic coordinate system aligned across tracks.

What to notice¶

The copy-number panel shows where the raw major and minor estimates land close to integer values and where they do not. The gray mismatch bands become more opaque as the rounding error increases. The fit summary above the panel is weighted by segment length, so long segments contribute more in the display. Adjusting rho and psi changes both the rounded copy numbers and the goodness-of-fit score.