Skip to main content

Aggregation Functions

Aggregation collapses a group of input rows into a single value per group. For clause-level semantics (implicit GROUP BY, HAVING-style filtering via WITH, where aggregates are legal) see the Aggregation query page.

All aggregates skip null inputs except count(*) (counts rows) and collect(expr) (keeps nulls). Empty-input semantics vary per function — see the summary table.

Summary table

FunctionDISTINCTEmpty inputNull inputReturns
count(*)0counted as 1 per rowInt
count(expr)yes0skippedInt
collect(expr)yes[]included as nullList
sum(expr)yesnullskippedInt if all-int, else Float
avg(expr)yesnullskippedFloat
min(expr) / max(expr)yesnullskippedsame type as element
stdev(expr)0.0skippedFloat, sample (n − 1)
stdevp(expr)0.0skippedFloat, population (n)
percentileCont(expr, p)nullskippedFloat, linear interpolation
percentileDisc(expr, p)nullskippedFloat, nearest rank

count

Row count

MATCH (n:User)
RETURN count(*) AS users

Non-null count

UNWIND [1, 2, null, 4] AS x
RETURN count(*), count(x)
// 4, 3

count(*) counts every input row, including rows where bound variables are null; count(expr) skips null expr.

Distinct count

UNWIND ['a', 'a', 'b', 'c'] AS x
RETURN count(x), count(DISTINCT x)
// 4, 3

count with OPTIONAL MATCH

This is the subtlety that trips up new Cypher users:

MATCH (u:User)
OPTIONAL MATCH (u)-[:WROTE]->(p:Post)
RETURN u.name,
     count(*) AS rows,   // 1 per user, even if no posts
     count(p) AS posts   // 0 if no posts

Always prefer count(expr) when you want zeros for optional matches.

Empty graph

MATCH (:NoSuchLabel)
RETURN count(*)       // 0

One row out, with value 0count(*) never returns null.

collect

Basic collect

MATCH (p:Person)-[:KNOWS]->(f:Person)
RETURN p.name, collect(f.name) AS friends

Distinct values

UNWIND [1, 2, null, 2, 3] AS x
RETURN collect(x),           // [1, 2, null, 2, 3]
     collect(DISTINCT x)   // [1, 2, null, 3]

Collect keeps nulls

collect keeps nulls that survive to the aggregate. Filter before the aggregate if you don't want them:

UNWIND [1, 2, null, 3] AS x
WITH x WHERE x IS NOT NULL
RETURN collect(x)             // [1, 2, 3]

Collect + slice for top-N

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, p ORDER BY p.published_at DESC
WITH u, collect(p.title)[..5] AS last_five
RETURN u.name, last_five

Use any list operation on the resulting list.

Collect of maps

MATCH (p:Project)-[:HAS_TASK]->(t:Task)
RETURN p.name,
     collect({id: t.id, name: t.name, done: t.done}) AS tasks

One row per project, with an array of task summaries.

sum

UNWIND [1, 2, null, 4] AS x
RETURN sum(x);                 // 7

UNWIND [1.0, 2.5, 3.5] AS x
RETURN sum(x);                 // 7.0

MATCH (:Never)
RETURN sum(1)                 // null   (empty input)

Return type: Int when every contributing element is an Int; Float if any contributor is a Float.

Distinct sum

UNWIND [1, 1, 2, 2, 3] AS x
RETURN sum(x), sum(DISTINCT x)   // 9, 6

avg

UNWIND [1, 2, 3, 4] AS x
RETURN avg(x);                 // 2.5

UNWIND [1, null, 3] AS x
RETURN avg(x);                 // 2.0

MATCH (:Never)
RETURN avg(1)                 // null

Always returns Float (or null on empty input).

min / max

Works on numbers, strings, and temporal values under their natural total order.

UNWIND ['banana', 'apple', 'cherry'] AS s
RETURN min(s), max(s);         // 'apple', 'cherry'

UNWIND ['2024-01-01'::DATE, '2024-06-30'::DATE, '2024-12-15'::DATE] AS d
RETURN min(d), max(d)
;// 2024-01-01, 2024-12-15

MATCH (:Never)
RETURN min(1)                 // null

Min/max with tiebreaker

min() / max() only return the value, not the node owning it. For "the node with the maximum", sort and take one:

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, count(p) AS posts
ORDER BY posts DESC
LIMIT 1
RETURN u.name, posts

stdev / stdevp

Sample vs population standard deviation.

UNWIND [2, 4, 4, 4, 5, 5, 7, 9] AS x
RETURN stdev(x),              // 2.1380…  (n − 1)
     stdevp(x)              // 2.0      (n)
  • stdev returns 0.0 when fewer than two non-null values are aggregated.
  • stdevp returns 0.0 on an empty input.
MATCH (r:Review)
WITH r.product AS product, avg(r.stars) AS mean, stdev(r.stars) AS sd
WHERE sd > 1.0
RETURN product, mean, sd

percentileCont / percentileDisc

Both take the column and a percentile p ∈ [0, 1].

UNWIND [1, 2, 3, 4, 5] AS x
RETURN percentileCont(x, 0.5),  // 3.0    (exact median for odd count)
     percentileDisc(x, 0.5);   // 3

UNWIND [1, 2, 3, 4] AS x
RETURN percentileCont(x, 0.5),  // 2.5    (linear interpolation)
     percentileDisc(x, 0.5);   // 2      (nearest rank)

UNWIND [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] AS x
RETURN percentileCont(x, 0.9),  // 91.0
     percentileDisc(x, 0.9)   // 90
  • percentileCont interpolates between values.
  • percentileDisc picks an actual input value.

Use percentileCont when the "true" percentile matters (latency histograms); percentileDisc when you need a real observation (the actual data point at P50).

Typical patterns

Group + aggregate

MATCH (o:Order)-[:CONTAINS]->(i:Item)
RETURN o.region AS region,
     count(i)     AS items,
     sum(i.price) AS revenue
ORDER BY revenue DESC

Aggregate + filter (HAVING)

MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WITH c.name AS company, count(p) AS employees
WHERE employees > 5
RETURN company, employees

Multiple aggregates in one RETURN

MATCH (r:Review)
RETURN count(*) AS n,
     avg(r.stars) AS mean,
     stdev(r.stars) AS sd,
     percentileCont(r.stars, 0.5)  AS median,
     percentileCont(r.stars, 0.95) AS p95

Re-aggregate after first aggregate

MATCH (o:Order)
WITH o.region AS region, sum(o.amount) AS revenue
RETURN count(region) AS regions,
     avg(revenue)  AS mean_regional_revenue

Rolling count by date bucket

MATCH (e:Event)
RETURN temporal.truncate('month', e.at) AS month,
     count(*) AS events
ORDER BY month

Uses temporal.truncate.

Percentile per group

MATCH (r:Review)
RETURN r.product AS product,
     percentileCont(r.stars, 0.5)  AS p50,
     percentileCont(r.stars, 0.95) AS p95
ORDER BY p95 DESC

Count-if via CASE

count(expr) skips null. A CASE with an omitted ELSE returns null, so the pattern cleanly expresses "count rows where condition holds":

MATCH (o:Order)
RETURN o.region,
     count(CASE WHEN o.status = 'paid'      THEN 1 END) AS paid,
     count(CASE WHEN o.status = 'cancelled' THEN 1 END) AS cancelled

See CASE.

Sum with a filter on the aggregated value

WHERE runs before aggregation. To filter aggregated values, pipe through WITH:

MATCH (o:Order)
WITH o.customer AS customer, sum(o.amount) AS lifetime
WHERE lifetime > 1000
RETURN customer, lifetime
ORDER BY lifetime DESC

Aggregation with ORDER BY inside collect

collect preserves input order. Sort the rows before the aggregate to produce an ordered list:

MATCH (u:User)-[:WROTE]->(p:Post)
WITH u, p ORDER BY p.published_at DESC
RETURN u.handle, collect(p.title)[..3] AS latest_three

Limitations

  • Aggregates are rejected inside WHERE. Use WITH … WHERE instead.
  • stdev, stdevp, percentileCont, percentileDisc don't support DISTINCT. For a percentile of distinct values, first collect(DISTINCT x), then UNWIND and aggregate.
  • count(*) counts rows, not non-null values. Use count(expr) when you want nulls skipped.
  • No GROUP BY keyword — non-aggregated columns in the same projection stage form the group key implicitly.

See Limitations for the full list.

See also