Skip to content

Aggregate

The "aggregate" transform summarizes data fields using aggregate functions such as "sum", "median", "q1", or "max". The data can be grouped by one or more fields, which results in a list of objects with the grouped fields and the aggregate values.

Parameters

as

Type: array

The names for the output fields corresponding to each aggregated field. If not provided, names will be automatically created using the operation and field names (e.g., sum_field, average_field).

fields

Type: array

The data fields to apply aggregate functions to. This array should correspond with the ops and as arrays. If no fields or operations are specified, a count aggregation will be applied by default.

groupby

Type: array

The fields by which to group the data. If these are not defined, all data objects will be grouped into a single category.

ops

Type: array

The aggregation operations to be performed on the fields, such as "sum", "q1", "median", "q3", or "count".

Available aggregate functions

Aggregate functions are applied to the data fields in each group.

  • "count": Count the number of records in each group.
  • "valid": Count the number of non-null and non-NaN values.
  • "sum": Sum the values.
  • "min": Find the minimum value.
  • "max": Find the maximum value.
  • "mean": Calculate the mean value.
  • "q1": Calculate the first quartile.
  • "median": Calculate the median value.
  • "q3": Calculate the third quartile.
  • "variance": Calculate the variance.

Example

Given the following data:

x y
first 123
first 456
second 789

... and configuration:

{
  "type": "aggregate",
  "groupby": ["x"]
}

A new list of data objects is created:

x count
first 2
second 1

Calculating min and max

{
  "description": "Aggregate transform example calculating min and max.",

  "data": {
    "values": [
      { "Category": "A", "Value": 5 },
      { "Category": "A", "Value": 9 },
      { "Category": "A", "Value": 9.5 },
      { "Category": "B", "Value": 3 },
      { "Category": "B", "Value": 5 },
      { "Category": "B", "Value": 7.5 },
      { "Category": "B", "Value": 8 }
    ]
  },

  "encoding": {
    "y": { "field": "Category", "type": "nominal" }
  },

  "layer": [
    {
      "encoding": {
        "x": { "field": "Value", "type": "quantitative" }
      },
      "mark": "point"
    },
    {
      "transform": [
        {
          "type": "aggregate",
          "groupby": ["Category"],
          "fields": ["Value", "Value"],
          "ops": ["min", "max"],
          "as": ["minValue", "maxValue"]
        }
      ],
      "encoding": {
        "x": { "field": "minValue", "type": "quantitative" },
        "x2": { "field": "maxValue" }
      },
      "mark": "rule"
    }
  ]
}

Building boxplot statistics

The following example uses "aggregate" to compute grouped "min", "q1", "median", "q3", and "max" values from the Palmer Penguins dataset and then layers them into a boxplot-like view.

{
  "description": [
    "Aggregate-driven boxplot of penguin body mass by species.",
    "Quartiles come from the aggregate transform. Whiskers use grouped min/max because Tukey-style outlier filtering would require joinaggregate-style transforms."
  ],

  "width": 300,
  "height": 200,

  "data": {
    "url": "vega-datasets/penguins.json"
  },

  "transform": [{ "type": "filter", "expr": "isFinite(datum['Body Mass (g)'])" }],

  "encoding": {
    "x": {
      "field": "Species",
      "type": "nominal",
      "title": null,
      "scale": { "padding": 0.4 },
      "axis": { "labelAngle": 0 }
    }
  },

  "layer": [
    {
      "transform": [
        {
          "type": "aggregate",
          "groupby": ["Species"],
          "fields": [
            "Body Mass (g)",
            "Body Mass (g)",
            "Body Mass (g)",
            "Body Mass (g)",
            "Body Mass (g)"
          ],
          "ops": ["min", "q1", "median", "q3", "max"],
          "as": [
            "minBodyMass",
            "q1BodyMass",
            "medianBodyMass",
            "q3BodyMass",
            "maxBodyMass"
          ]
        }
      ],
      "layer": [
        {
          "mark": {
            "type": "rule",
            "color": "#7b8794",
            "size": 2,
            "tooltip": null
          },
          "encoding": {
            "y": {
              "field": "minBodyMass",
              "type": "quantitative",
              "scale": { "zero": false },
              "axis": { "title": "Body mass (g)" }
            },
            "y2": { "field": "maxBodyMass" }
          }
        },

        {
          "mark": {
            "type": "rect",
            "stroke": "#243b53",
            "strokeWidth": 1,
            "tooltip": null
          },
          "encoding": {
            "y": { "field": "q3BodyMass", "type": "quantitative" },
            "y2": { "field": "q1BodyMass" },
            "color": {
              "field": "Species",
              "type": "nominal",
              "scale": {
                "domain": ["Chinstrap", "Adelie", "Gentoo"],
                "range": ["#BF5CCA", "#FF6C02", "#0F7574"]
              }
            }
          }
        },

        {
          "mark": {
            "type": "tick",
            "color": "#102a43",
            "thickness": 2,
            "tooltip": null
          },
          "encoding": {
            "y": { "field": "medianBodyMass", "type": "quantitative" }
          }
        },

        {
          "mark": {
            "type": "tick",
            "color": "#243b53",
            "thickness": 2,
            "tooltip": null
          },
          "encoding": {
            "x": {
              "field": "Species",
              "type": "nominal",
              "band": 0.3
            },
            "y": { "field": "minBodyMass", "type": "quantitative" }
          }
        },

        {
          "mark": {
            "type": "tick",
            "color": "#243b53",
            "thickness": 2,
            "tooltip": null
          },
          "encoding": {
            "x": {
              "field": "Species",
              "type": "nominal",
              "band": 0.3
            },
            "y": { "field": "maxBodyMass", "type": "quantitative" }
          }
        }
      ]
    }
  ]
}