Query
Query the semantic layer using SQL syntax or the Python API
Query the semantic layer using SQL syntax or the Python API.
Query Methods
Sidemantic provides two ways to query your semantic layer:
- SQL Interface - Familiar SQL syntax with automatic join handling
- Python API - Programmatic query building with type safety
SQL Interface
Basic SQL Queries
# Simple metric query
result = layer.sql("SELECT revenue FROM orders")
# With dimensions and filters
result = layer.sql("""
SELECT revenue, status
FROM orders
WHERE status = 'completed'
""")
# Get DataFrame
df = result.fetchdf()
Supported SQL Features
| Feature | Supported | Notes |
|---|---|---|
| SELECT | ✅ Yes | Metrics and dimensions |
| **SELECT *** | ✅ Yes | Expands to all model fields |
| WHERE | ✅ Yes | Standard SQL conditions |
| ORDER BY | ✅ Yes | Sort by any field |
| LIMIT / OFFSET | ✅ Yes | Pagination support |
| Parameters | ✅ Yes | {{ param }} syntax |
| Cross-Model Queries | ✅ Yes | Auto-joins via relationships |
| Subqueries | ✅ Yes | Query semantic layer in subqueries |
| CTEs / WITH | ✅ Yes | Use CTEs with semantic queries |
| JOIN | ❌ No | Joins are automatic |
| GROUP BY | ❌ No | Grouping is automatic |
| Aggregate Functions | ❌ No | Use pre-defined metrics |
| HAVING | ❌ No | Use WHERE on metrics instead |
| Window Functions | ❌ No | Use cumulative metrics |
| DISTINCT | ❌ No | Dimensions are auto-distinct |
SELECT
Select metrics and dimensions:
layer.sql("SELECT revenue, order_count, status FROM orders")
Table prefixes optional:
# Both work
layer.sql("SELECT orders.revenue FROM orders")
layer.sql("SELECT revenue FROM orders")
FROM Clause Uses Model Names
The FROM clause references semantic model names, not underlying table names:
models:
- name: orders
table: raw_orders_staging
layer.sql("SELECT revenue FROM orders") # ✅ Use model name
WHERE
Filter with standard SQL conditions:
layer.sql("""
SELECT revenue
FROM orders
WHERE status = 'completed'
AND order_date >= '2024-01-01'
AND amount > 100
""")
ORDER BY and LIMIT
layer.sql("""
SELECT revenue, status
FROM orders
ORDER BY revenue DESC
LIMIT 10
""")
Cross-Model Queries
Reference multiple models - joins happen automatically:
layer.sql("""
SELECT
orders.revenue,
customers.region,
products.category
FROM orders
""")
Parameters
Use {{ param }} syntax:
layer.sql(
"""
SELECT revenue
FROM orders
WHERE order_date >= {{ start_date }}
AND region = {{ region }}
""",
parameters={
"start_date": "2024-01-01",
"region": "US"
}
)
Don't Quote Parameters
Parameters are automatically quoted based on type:
- ❌
WHERE date >= '{{ start_date }}'(creates''2024-01-01'') - ✅
WHERE date >= {{ start_date }}
Subqueries
Query semantic layer in subqueries:
layer.sql("""
SELECT *
FROM (
SELECT revenue, status FROM orders
) AS orders_agg
WHERE revenue > 1000
""")
Join semantic query results with regular tables:
layer.sql("""
SELECT
orders_agg.revenue,
r.continent
FROM (
SELECT orders.revenue, customers.region
FROM orders
) AS orders_agg
JOIN regions r ON orders_agg.region = r.region
""")
CTEs
Use CTEs with semantic layer queries:
layer.sql("""
WITH orders_by_region AS (
SELECT revenue, status, customers.region
FROM orders
)
SELECT * FROM orders_by_region
WHERE revenue > 500
ORDER BY revenue DESC
""")
Mix semantic and regular CTEs:
layer.sql("""
WITH
orders_agg AS (
SELECT revenue, status FROM orders
),
status_labels AS (
SELECT 'completed' as code, 'Complete' as label
UNION ALL SELECT 'pending', 'Pending'
)
SELECT o.revenue, s.label
FROM orders_agg o
JOIN status_labels s ON o.status = s.code
""")
Python API
Basic Query
from sidemantic import SemanticLayer
layer = SemanticLayer.from_yaml("models.yml")
# Query metrics and dimensions
result = layer.query(
metrics=["orders.revenue", "orders.order_count"],
dimensions=["orders.status"]
)
# Get results as DataFrame
df = result.fetchdf()
print(df)
Query Parameters
metrics (list[str] | None)
- List of metric references to aggregate
- Format:
"model.metric"or"metric"(for graph-level metrics) - Example:
["orders.revenue", "total_revenue"]
dimensions (list[str] | None)
- List of dimensions to group by
- Format:
"model.dimension" - Example:
["orders.status", "customers.region"]
filters (list[str] | None)
- SQL filter expressions
- Example:
["orders.status = 'completed'", "orders.amount > 100"]
segments (list[str] | None)
- Named segment references
- Example:
["orders.high_value", "customers.active"]
order_by (list[str] | None)
- Fields to order by with optional ASC/DESC
- Example:
["orders.revenue DESC", "orders.status"]
limit (int | None)
- Maximum rows to return
- Example:
100
ungrouped (bool)
- Return raw rows without aggregation (no GROUP BY)
- Default:
False
parameters (dict[str, any] | None)
- Jinja2 template parameters
- Example:
{"include_tax": True, "region": "US"}
use_preaggregations (bool | None)
- Override layer-level pre-aggregation setting
- Default:
None(uses layer setting)
Examples
Simple metric query:
result = layer.query(metrics=["orders.revenue"])
print(result.fetchone()) # (12345.67,)
With dimensions:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"]
)
df = result.fetchdf()
# status revenue
# 0 completed 10000.00
# 1 pending 2345.67
Cross-model query:
# Automatically joins orders -> customers
result = layer.query(
metrics=["orders.revenue"],
dimensions=["customers.region"]
)
With filters:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"],
filters=["orders.created_at >= '2024-01-01'"]
)
With segments:
result = layer.query(
metrics=["orders.revenue"],
segments=["orders.completed"]
)
Sorting and pagination:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"],
order_by=["orders.revenue DESC"],
limit=10
)
Ungrouped (raw rows):
# Get individual order details without aggregation
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.order_id", "orders.customer_id"],
ungrouped=True,
limit=100
)
With parameters:
result = layer.query(
metrics=["orders.taxed_revenue"],
parameters={"include_tax": True, "tax_rate": 0.08}
)
Result Formats
fetchone() - Single row as tuple:
result = layer.query(metrics=["orders.revenue"])
row = result.fetchone()
print(row) # (12345.67,)
fetchall() - All rows as list of tuples:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"]
)
rows = result.fetchall()
# [('completed', 10000.0), ('pending', 2345.67)]
fetchdf() - Pandas DataFrame:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"]
)
df = result.fetchdf()
description - Column metadata:
result = layer.query(metrics=["orders.revenue"])
columns = [desc[0] for desc in result.description]
print(columns) # ['revenue']
Compile Without Execution
Generate SQL without executing:
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.status"],
filters=["orders.status = 'completed'"]
)
print(sql)
# WITH orders_cte AS (
# SELECT status, SUM(amount) as revenue
# FROM orders
# WHERE status = 'completed'
# GROUP BY status
# )
# SELECT * FROM orders_cte
Additional compile() parameters:
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.status"],
filters=["orders.status = 'completed'"],
order_by=["orders.revenue DESC"],
limit=10,
offset=5, # Skip first 5 rows
dialect="postgres" # Override SQL dialect
)
CLI Query
Query from command line:
# Query to stdout
sidemantic query models.yml --sql "SELECT revenue FROM orders"
# Save to file
sidemantic query models.yml \
--sql "SELECT revenue, status FROM orders" \
--output results.csv
# With connection override
sidemantic query models.yml \
--connection "postgres://localhost:5432/analytics" \
--sql "SELECT revenue FROM orders WHERE status = 'completed'"
See CLI for complete CLI reference.
Best Practices
Define Metrics in YAML
Define metrics once, query anywhere:
metrics:
- name: revenue
agg: sum
sql: amount
- name: high_value_revenue
agg: sum
sql: "CASE WHEN amount > 1000 THEN amount ELSE 0 END"
# Simple queries
layer.query(metrics=["orders.revenue"])
layer.sql("SELECT revenue FROM orders")
Use Python API for Dynamic Queries
Build queries programmatically:
metrics = ["orders.revenue"]
dimensions = []
filters = []
# Add dimensions based on condition
if group_by_status:
dimensions.append("orders.status")
# Add filters based on user input
if start_date:
filters.append(f"orders.created_at >= '{start_date}'")
result = layer.query(
metrics=metrics,
dimensions=dimensions,
filters=filters
)
Use SQL for Static Queries
SQL is clearer for fixed queries:
# Dashboard widget
revenue_by_status = layer.sql("""
SELECT revenue, status
FROM orders
WHERE created_at >= '2024-01-01'
ORDER BY revenue DESC
""")
Compile for Inspection
Inspect generated SQL:
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["customers.region"]
)
print(sql) # See what query will be executed
Process Results Efficiently
Small results (< 1000 rows):
result = layer.query(metrics=["orders.revenue"])
df = result.fetchdf()
Large results:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.order_id"],
ungrouped=True
)
# Iterate rows
for row in result.fetchall():
process(row)
Streaming:
# Use Arrow for large datasets
result = layer.adapter.execute(sql)
batches = result.fetch_arrow_reader()
for batch in batches:
process(batch)
Comparison: Python API vs SQL
Python API:
- ✅ Type-safe query building
- ✅ Dynamic query construction
- ✅ Better for programmatic use
- ✅ Explicit parameters
SQL Interface:
- ✅ Familiar syntax
- ✅ Readable for static queries
- ✅ Copy-paste friendly
- ✅ Template parameters
Use Python API when:
- Building queries programmatically
- Constructing dynamic filters
- Need type safety
- Building applications/APIs
Use SQL Interface when:
- Writing dashboard queries
- Ad-hoc exploration
- Static report queries
- Prefer SQL syntax
Related Documentation
- Configuration - YAML configuration reference
- Models - Model definition
- Metrics - Metric types
- CLI - CLI query command
- Python API - Complete API reference