Aggregate¶
The "aggregate" transform summarizes data fields using aggregate functions
such as "sum", "median", "q1", or "max". The data can be grouped by
one or more fields, which results in a list of objects with the grouped fields
and the aggregate values.
Parameters¶
as-
Type: array
The names for the output fields corresponding to each aggregated field. If not provided, names will be automatically created using the operation and field names (e.g.,
sum_field,average_field). fields-
Type: array
The data fields to apply aggregate functions to. This array should correspond with the
opsandasarrays. If no fields or operations are specified, a count aggregation will be applied by default. groupby-
Type: array
The fields by which to group the data. If these are not defined, all data objects will be grouped into a single category.
ops-
Type: array
The aggregation operations to be performed on the fields, such as
"sum","q1","median","q3", or"count".
Available aggregate functions¶
Aggregate functions are applied to the data fields in each group.
"count": Count the number of records in each group."valid": Count the number of non-null and non-NaN values."sum": Sum the values."min": Find the minimum value."max": Find the maximum value."mean": Calculate the mean value."q1": Calculate the first quartile."median": Calculate the median value."q3": Calculate the third quartile."variance": Calculate the variance.
Example¶
Given the following data:
| x | y |
|---|---|
| first | 123 |
| first | 456 |
| second | 789 |
... and configuration:
{
"type": "aggregate",
"groupby": ["x"]
}
A new list of data objects is created:
| x | count |
|---|---|
| first | 2 |
| second | 1 |
Calculating min and max¶
{
"description": "Aggregate transform example calculating min and max.",
"data": {
"values": [
{ "Category": "A", "Value": 5 },
{ "Category": "A", "Value": 9 },
{ "Category": "A", "Value": 9.5 },
{ "Category": "B", "Value": 3 },
{ "Category": "B", "Value": 5 },
{ "Category": "B", "Value": 7.5 },
{ "Category": "B", "Value": 8 }
]
},
"encoding": {
"y": { "field": "Category", "type": "nominal" }
},
"layer": [
{
"encoding": {
"x": { "field": "Value", "type": "quantitative" }
},
"mark": "point"
},
{
"transform": [
{
"type": "aggregate",
"groupby": ["Category"],
"fields": ["Value", "Value"],
"ops": ["min", "max"],
"as": ["minValue", "maxValue"]
}
],
"encoding": {
"x": { "field": "minValue", "type": "quantitative" },
"x2": { "field": "maxValue" }
},
"mark": "rule"
}
]
}
Building boxplot statistics¶
The following example uses "aggregate" to compute grouped "min", "q1",
"median", "q3", and "max" values from the Palmer Penguins
dataset and then layers them
into a boxplot-like view.
{
"description": [
"Aggregate-driven boxplot of penguin body mass by species.",
"Quartiles come from the aggregate transform. Whiskers use grouped min/max because Tukey-style outlier filtering would require joinaggregate-style transforms."
],
"width": 300,
"height": 200,
"data": {
"url": "vega-datasets/penguins.json"
},
"transform": [{ "type": "filter", "expr": "isFinite(datum['Body Mass (g)'])" }],
"encoding": {
"x": {
"field": "Species",
"type": "nominal",
"title": null,
"scale": { "padding": 0.4 },
"axis": { "labelAngle": 0 }
}
},
"layer": [
{
"transform": [
{
"type": "aggregate",
"groupby": ["Species"],
"fields": [
"Body Mass (g)",
"Body Mass (g)",
"Body Mass (g)",
"Body Mass (g)",
"Body Mass (g)"
],
"ops": ["min", "q1", "median", "q3", "max"],
"as": [
"minBodyMass",
"q1BodyMass",
"medianBodyMass",
"q3BodyMass",
"maxBodyMass"
]
}
],
"layer": [
{
"mark": {
"type": "rule",
"color": "#7b8794",
"size": 2,
"tooltip": null
},
"encoding": {
"y": {
"field": "minBodyMass",
"type": "quantitative",
"scale": { "zero": false },
"axis": { "title": "Body mass (g)" }
},
"y2": { "field": "maxBodyMass" }
}
},
{
"mark": {
"type": "rect",
"stroke": "#243b53",
"strokeWidth": 1,
"tooltip": null
},
"encoding": {
"y": { "field": "q3BodyMass", "type": "quantitative" },
"y2": { "field": "q1BodyMass" },
"color": {
"field": "Species",
"type": "nominal",
"scale": {
"domain": ["Chinstrap", "Adelie", "Gentoo"],
"range": ["#BF5CCA", "#FF6C02", "#0F7574"]
}
}
}
},
{
"mark": {
"type": "tick",
"color": "#102a43",
"thickness": 2,
"tooltip": null
},
"encoding": {
"y": { "field": "medianBodyMass", "type": "quantitative" }
}
},
{
"mark": {
"type": "tick",
"color": "#243b53",
"thickness": 2,
"tooltip": null
},
"encoding": {
"x": {
"field": "Species",
"type": "nominal",
"band": 0.3
},
"y": { "field": "minBodyMass", "type": "quantitative" }
}
},
{
"mark": {
"type": "tick",
"color": "#243b53",
"thickness": 2,
"tooltip": null
},
"encoding": {
"x": {
"field": "Species",
"type": "nominal",
"band": 0.3
},
"y": { "field": "maxBodyMass", "type": "quantitative" }
}
}
]
}
]
}