ASCAT Algorithm in GenomeSpy¶
This page visualizes the core ASCAT fit for sample S96 using the same
simulated example data as the companion ASCAT Copy-Number
Segmentation page and the method from Allele-specific copy number
analysis of tumors. It takes
ASCAT's segmented logRMean and bafMean values and applies the
purity/ploidy equations directly in the spec. The sliders expose the
aberrant cell fraction (rho) and average ploidy (psi), the top panel
shows a segment-length-weighted goodness-of-fit proxy, and the middle panel
compares the raw major/minor copy-number estimates with the rounded integers.
The lower panels show the LogR and B-allele frequency tracks that drive the
fit.
Because the input is already segmented, this example does not rerun ASCAT's
ASPCF preprocessing or search over candidate parameter grids. Instead, it
shows how the current rho and psi values propagate through the equations,
how close the raw estimates are to integer copy numbers, and how the fit score
changes when the parameters move.
{
"description": [
"ASCAT algorithm in GenomeSpy",
"Loo P Van, Nordgard SH, Lingjærde OC, et al.",
"Allele-specific copy number analysis of tumors",
"Proc Natl Acad Sci. 2010;107(39):16910-16915. doi:10.1073/pnas.1009843107"
],
"assembly": "hg18",
"data": {
"url": "https://data.genomespy.app/sample-data/ASCAT/segments_S96.tsv"
},
"params": [{ "name": "minLength", "value": 1 }],
"resolve": { "axis": { "x": "shared" } },
"encoding": {
"x": {
"chrom": "chr",
"pos": "startpos",
"type": "locus",
"scale": { "type": "locus" },
"axis": { "title": null }
},
"x2": {
"chrom": "chr",
"pos": "endpos",
"offset": 1
}
},
"vconcat": [
{
"params": [
{
"name": "rho",
"value": 0.56,
"bind": {
"input": "range",
"min": 0.01,
"max": 1.0,
"step": 0.01,
"description": "Aberrant cell fraction (Tumor purity)"
}
},
{
"name": "psi",
"value": 2.75,
"bind": {
"input": "range",
"min": 0.5,
"max": 7.0,
"step": 0.05,
"description": "Average ploidy"
}
}
],
"transform": [
{
"type": "collect"
},
{
"type": "formula",
"expr": "(rho - 1 + pow(2, datum.logRMean) * (1 - datum.bafMean) * (2 * (1 - rho) + rho * psi)) / rho",
"as": "aRaw"
},
{
"type": "formula",
"expr": "(rho - 1 + pow(2, datum.logRMean) * datum.bafMean * (2 * (1 - rho) + rho * psi)) / rho",
"as": "bRaw"
},
{
"type": "formula",
"expr": "max(0, round(datum.aRaw))",
"as": "nMajor"
},
{
"type": "formula",
"expr": "max(0, round(datum.bRaw))",
"as": "nMinor"
},
{
"type": "formula",
"expr": "abs(datum.aRaw - datum.nMajor)",
"as": "aError"
},
{
"type": "formula",
"expr": "abs(datum.bRaw - datum.nMinor)",
"as": "bError"
},
{
"type": "formula",
"expr": "log((2 * (1 - rho) + rho * (datum.nMajor + datum.nMinor)) / (2 * (1 - rho) + rho * psi)) / LN2",
"as": "logRMean_ASCAT"
},
{
"type": "formula",
"expr": "(1 - rho + rho * datum.nMinor) / (2 - 2 * rho + rho * (datum.nMajor + datum.nMinor))",
"as": "bafMean_ASCAT"
},
{
"type": "identifier"
}
],
"resolve": { "axis": { "x": "shared" } },
"vconcat": [
{
"name": "goodnessOfFit",
"title": {
"text": "Goodness of fit",
"orient": "none"
},
"height": 24,
"resolve": {
"scale": { "x": "excluded" },
"axis": { "x": "excluded" }
},
"transform": [
{
"type": "project",
"fields": ["startpos", "endpos", "aError", "bError"]
},
{
"type": "formula",
"expr": "datum.endpos - datum.startpos",
"as": "segmentLength"
},
{
"type": "formula",
"expr": "min(datum.aError * datum.aError, 0.25) * datum.segmentLength",
"as": "aErrorSquaredWeighted"
},
{
"type": "formula",
"expr": "min(datum.bError * datum.bError, 0.25) * datum.segmentLength",
"as": "bErrorSquaredWeighted"
},
{
"type": "aggregate",
"fields": [
"segmentLength",
"aErrorSquaredWeighted",
"bErrorSquaredWeighted"
],
"ops": ["sum", "sum", "sum"],
"as": ["totalLength", "aSum", "bSum"]
},
{
"type": "formula",
"expr": "100 - (datum.aSum + datum.bSum) / (datum.totalLength * 0.5) * 100",
"as": "goodnessOfFit"
}
],
"layer": [
{
"name": "bar",
"mark": {
"type": "rect",
"color": "#a0e7e5",
"tooltip": null
},
"encoding": {
"x": {
"field": "goodnessOfFit",
"type": "quantitative",
"scale": {
"domain": [0, 100],
"clamp": true
}
},
"x2": { "datum": 0 }
}
},
{
"name": "text",
"mark": {
"type": "text",
"size": 20,
"tooltip": null
},
"encoding": {
"x": { "value": 0.5 },
"x2": null,
"text": {
"field": "goodnessOfFit",
"format": ".2f"
}
}
}
]
},
{
"name": "roundedAndDifference",
"title": {
"text": "Copy numbers rounded to integers + difference to raw values",
"style": "overlay"
},
"height": { "grow": 2 },
"layer": [
{
"title": "mismatchMinor",
"mark": {
"type": "rect",
"minWidth": { "expr": "minLength" },
"fillOpacity": 0.15,
"strokeOpacity": 1,
"strokeWidth": 1,
"fill": "gray",
"stroke": "#88d27a"
},
"encoding": {
"y": {
"field": "bRaw",
"type": "quantitative"
},
"y2": {
"field": "nMinor"
},
"strokeOpacity": {
"field": "bError",
"type": "quantitative",
"scale": {
"type": "pow",
"base": 2,
"domain": [0, 0.5],
"range": [0, 0.8]
}
}
}
},
{
"name": "mismatchMajor",
"title": "mismatchMajor",
"mark": {
"type": "rect",
"minWidth": { "expr": "minLength" },
"fillOpacity": 0.15,
"strokeOpacity": 1,
"strokeWidth": 1,
"fill": "gray",
"stroke": "#f06850"
},
"encoding": {
"y": {
"field": "aRaw",
"type": "quantitative"
},
"y2": {
"field": "nMajor"
},
"strokeOpacity": {
"field": "aError",
"type": "quantitative",
"scale": {
"type": "pow",
"base": 2,
"domain": [0, 0.5],
"range": [0, 0.8]
}
}
}
},
{
"title": "nMinor",
"mark": {
"type": "rule",
"minLength": { "expr": "minLength" },
"yOffset": -3.0
},
"encoding": {
"y": {
"field": "nMinor",
"type": "quantitative",
"scale": {
"domain": [0, 6],
"padding": 0.04,
"clamp": true
},
"axis": {
"tickMinStep": 1.0
}
},
"size": { "value": 5 },
"color": { "value": "#88d27a" }
}
},
{
"title": "nMajor",
"mark": {
"type": "rule",
"minLength": { "expr": "minLength" },
"yOffset": 3.0
},
"encoding": {
"y": {
"field": "nMajor",
"type": "quantitative",
"scale": {
"domain": [0, 6]
}
},
"size": { "value": 5 },
"color": {
"field": "nMajor",
"type": "quantitative",
"scale": {
"domain": [0, 6, 16],
"range": ["#f06850", "#f06850", "#5F0F0F"]
}
}
}
}
]
}
]
},
{
"name": "logRTrack",
"layer": [
{
"data": {
"url": "https://data.genomespy.app/sample-data/ASCAT/raw_S96.tsv"
},
"title": "Single probe",
"mark": {
"type": "point",
"size": { "expr": "min(10 * pow(zoomLevel, 1.5), 200)" }
},
"encoding": {
"x": {
"chrom": "chr",
"pos": "pos",
"type": "locus"
},
"y": { "field": "logR", "type": "quantitative", "title": null },
"color": { "value": "#7090c0" },
"opacity": { "value": 0.25 },
"strokeWidth": { "value": 0 }
}
},
{
"title": "Mean LogR",
"mark": {
"type": "rule",
"minLength": { "expr": "minLength" }
},
"encoding": {
"y": {
"field": "logRMean",
"type": "quantitative",
"title": "LogR"
},
"size": { "value": 3 },
"color": { "value": "black" }
}
}
]
},
{
"name": "bafTrack",
"layer": [
{
"data": {
"url": "https://data.genomespy.app/sample-data/ASCAT/raw_S96.tsv"
},
"transform": [{ "type": "filter", "expr": "datum.baf !== null" }],
"title": "Single probe",
"mark": {
"type": "point",
"size": { "expr": "min(10 * pow(zoomLevel, 1.5), 200)" }
},
"encoding": {
"x": {
"chrom": "chr",
"pos": "pos",
"type": "locus"
},
"y": { "field": "baf", "type": "quantitative", "title": null },
"color": { "value": "#7090c0" },
"opacity": { "value": 0.3 },
"strokeWidth": { "value": 0 }
}
},
{
"title": "Mean BAF",
"mark": {
"type": "rule",
"minLength": { "expr": "minLength" }
},
"encoding": {
"y": {
"field": "bafMean",
"type": "quantitative",
"scale": { "domain": [0, 1] },
"title": "B-allele frequency"
},
"size": { "value": 3 },
"color": { "value": "black" }
}
},
{
"title": "Mean BAF",
"mark": {
"type": "rule",
"minLength": 3.0
},
"encoding": {
"y": {
"expr": "1 - datum.bafMean",
"type": "quantitative",
"title": null
},
"size": { "value": 3 },
"color": { "value": "black" }
}
}
]
}
],
"background": "#fafafa",
"config": {
"axisX": {
"grid": false,
"chromGrid": true,
"orient": "bottom"
},
"axisY": {
"grid": true,
"gridColor": "#f8f8f8"
},
"view": {
"fill": "white",
"stroke": "#c8c8c8",
"shadowBlur": 8,
"shadowColor": "black",
"shadowOpacity": 0.1
}
}
}
The visualization follows the ASCAT method described in
Allele-specific copy number analysis of tumors
by Loo et al. It is a GenomeSpy visualization of the core purity/ploidy
fit, shown here for the simulated S96 example data.
Equations¶
The spec implements the key ASCAT copy-number equations from the supplement.
The raw major and minor copy-number estimates are given by the supplementary
Eqs. S7 and S8, derived from logRMean, bafMean, rho, and psi:
aRaw = (rho - 1 + 2^logRMean * (1 - bafMean) * (2 * (1 - rho) + rho * psi)) / rhobRaw = (rho - 1 + 2^logRMean * bafMean * (2 * (1 - rho) + rho * psi)) / rho
It then rounds those values to nonnegative integers with round() and
max(0, ...), then uses the integer calls to predict the corresponding
logRMean_ASCAT and bafMean_ASCAT values. That forward calculation makes it
possible to compare the rounded copy-number state back to the observed segment
means and derive the goodness-of-fit proxy from the same distance-to-integer
idea as the paper's Eq. 3, but with segment-length weighting because the
example works on segmented input.
GenomeSpy Features¶
This example combines several GenomeSpy capabilities in one spec:
- Parameters bind
rhoandpsito sliders, so the data update interactively. formulatransforms implement the ASCAT equations for raw and rounded copy numbers, error terms, and the derivedlogRMeanandbafMeanvalues.aggregatecomputes the goodness-of-fit score from the segment-wise rounding error.vconcatstacks the summary, copy-number, LogR, and BAF views while sharing the same genomic x-axis.layeroverlays the raw values, rounded values, and mismatch bands in the same panel.- The
locusscale keeps the genomic coordinate system aligned across tracks.
What to notice¶
The copy-number panel shows where the raw major and minor estimates land close
to integer values and where they do not. The gray mismatch bands become more
opaque as the rounding error increases. The fit summary above the panel is
weighted by segment length, so long segments contribute more in the display.
Adjusting rho and psi changes both the rounded copy numbers and the
goodness-of-fit score.