Fan-out & Symmetric Aggregates
Why fan-out joins double count, what Sidemantic does, how to validate correctness, and when it can be expensive.
Fan-out happens when a join turns one row into many rows (e.g. orders → order_items). If you sum a fact column after a fan-out join, you will double count.
Runnable Example (Executed in CI)
Doctest fixture (models/semantic_layer.yml)
models:
- name: orders
sql: |
select * from (values
(1, 120.00),
(2, 80.00),
(3, 50.00),
(4, 200.00)
) as t(id, amount)
primary_key: id
dimensions:
- name: id
type: numeric
sql: id
metrics:
- name: revenue
agg: sum
sql: amount
relationships:
- name: order_items
type: one_to_many
foreign_key: order_id
- name: order_items
sql: |
select * from (values
(1, 1, 'SKU-1'),
(2, 1, 'SKU-2'),
(3, 1, 'SKU-2'),
(4, 2, 'SKU-2'),
(5, 4, 'SKU-3'),
(6, 4, 'SKU-4')
) as t(id, order_id, sku)
primary_key: id
dimensions:
- name: sku
type: categorical
sql: sku
relationships:
- name: orders
type: many_to_one
foreign_key: order_id
This query filters through a one-to-many relationship but groups at the order grain:
uvx sidemantic query "SELECT orders.id, orders.revenue FROM orders WHERE order_items.sku = 'SKU-2' ORDER BY orders.id" --models ./models
id,revenue
1,120.00
2,80.00
Inspect compiled SQL and verify the symmetric aggregate shape:
uvx sidemantic query "SELECT orders.id, orders.revenue FROM orders WHERE order_items.sku = 'SKU-2' ORDER BY orders.id" --models ./models --dry-run
HASH(orders_cte.id)
What Breaks (Double Counting)
Naive SQL:
select
sum(o.amount) as revenue
from orders o
left join order_items oi on oi.order_id = o.id
If an order has 3 items, that order’s amount appears 3 times in the join result and the sum is inflated.
What Sidemantic Does
When a query plan includes fan-out joins and you select a metric from the “one” side, Sidemantic uses symmetric aggregates so each entity (by primary key) is counted once.
How to Validate Correctness
- Baseline: run the metric query without fan-out dimensions.
- Fan-out present: run the same metric while joining through one-to-many relationships.
- Compare: totals should match when the query grain is still the entity grain.
Practical workflow:
- Use
--dry-run/ compiled SQL inspection to verify symmetric aggregates are in use. - Sanity check totals at the entity grain (e.g. per
orders.id) before adding lower-grain dimensions.
Performance Considerations
- Symmetric aggregates can be more expensive than naive sums.
- Prefer modeling where queries group at a stable grain (dimensions from the same grain as the metric) when possible.
Engine Compatibility
Symmetric aggregate strategies are implemented per engine/dialect. If you see unexpected SQL, capture the compiled SQL and file an issue with:
- the query
- model definitions
- the compiled SQL
Debugging Checklist
- All models in the join path have
primary_keyset. - Relationship types match reality (many-to-one vs one-to-many).
- The compiled SQL contains the symmetric aggregate form when fan-out is present.