Migrator
Migrate existing SQL queries to semantic layer by analyzing queries and generating model definitions.
Migrates existing SQL queries to semantic layer by analyzing queries and generating model definitions.
Key features:
- Generate model definitions from existing queries
- Rewrite queries to use semantic layer syntax
- Measure what's missing from your semantic layer
Quick Start
Generate models from your existing queries:
sidemantic migrator analyze queries/ --output-dir models/
This analyzes SQL files in queries/ and generates model YAML files in models/.
Example output models/orders.yml:
model:
name: orders
table: orders
dimensions:
- name: status
sql: status
- name: region
sql: region
metrics:
- name: sum_amount
agg: sum
sql: amount
- name: count
agg: count
sql: "*"
Add --rewrite to also generate rewritten queries:
sidemantic migrator analyze queries/ --output-dir models/ --rewrite
Original query:
SELECT status, region, SUM(amount), COUNT(*)
FROM orders
WHERE status = 'completed'
GROUP BY status, region
Rewritten query:
SELECT orders.status, orders.region, orders.sum_amount, orders.count
FROM orders
WHERE status = 'completed'
Supported Patterns
Basic Aggregations
Extracts dimensions from GROUP BY and metrics from aggregations:
SELECT
status,
SUM(amount) as revenue,
COUNT(DISTINCT customer_id) as customers
FROM orders
GROUP BY status
Complex Expressions
Handles CASE, COALESCE, CAST, math, string functions:
SELECT
UPPER(status) as status,
SUM(CASE WHEN priority = 'high' THEN amount ELSE 0 END) as high_priority_revenue,
SUM(quantity * price) as total_revenue
FROM orders
GROUP BY UPPER(status)
Extracts underlying columns (status, amount, quantity, price).
Time Dimensions
SELECT
DATE_TRUNC('month', order_date) as month,
EXTRACT(YEAR FROM order_date) as year,
SUM(amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date), EXTRACT(YEAR FROM order_date)
Extracts order_date as time dimension. Rewrites to orders.order_date__month, orders.order_date__year.
Derived Metrics
SELECT
status,
SUM(revenue) / COUNT(*) as avg_revenue_per_order,
SUM(amount - discount) as net_amount
FROM orders
GROUP BY status
Extracts base metrics and derived metrics separately:
metrics:
- name: sum_revenue
agg: sum
sql: revenue
- name: count
agg: count
sql: "*"
- name: avg_revenue_per_order
type: derived
sql: "SUM(revenue) / COUNT(*)"
Joins and Relationships
SELECT
c.region,
COUNT(o.order_id) as order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.region
Extracts relationship: orders.customer_id → customers.id (many_to_one).
# orders.yml
relationships:
- name: customers
type: many_to_one
foreign_key: customer_id
Subqueries
Resolves subquery aliases to underlying tables:
SELECT
sub.status,
COUNT(*) as order_count
FROM (
SELECT status, amount
FROM orders
WHERE amount > 100
) sub
GROUP BY sub.status
Extracts orders table and resolves sub.status → orders.status.
Window Functions and Cumulative Metrics
Detects window functions and generates cumulative metric definitions.
Running totals:
SELECT
order_date,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders
metrics:
- name: running_total
type: cumulative
sql: "orders.sum_amount"
Rolling windows:
SELECT
order_date,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_7day
FROM orders
metrics:
- name: rolling_7day
type: cumulative
sql: "orders.sum_amount"
window: "6 days"
Period-to-date:
SELECT
order_date,
SUM(amount) OVER (
PARTITION BY DATE_TRUNC('month', order_date)
ORDER BY order_date
) as mtd_revenue
FROM orders
metrics:
- name: mtd_revenue
type: cumulative
sql: "orders.sum_amount"
grain_to_date: "month"
What's extracted:
- Aggregation window functions:
SUM/AVG/COUNT/MIN/MAXwithOVER() - Rolling windows with
ROWS BETWEEN - Period-to-date with
PARTITION BY DATE_TRUNC()
What's ignored:
- Window-only functions:
ROW_NUMBER(),RANK(),LAG(),LEAD() - Complex window calculations like
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER()
Base aggregations inside complex calculations are still extracted as regular metrics.
Other Patterns
CTEs: Extracts from underlying tables, preserves structure in rewrites
GROUP BY ordinals: GROUP BY 1, 2 resolved to columns
Implicit joins: FROM a, b WHERE a.id = b.fk extracts relationships
Self-joins: Generates single model
CLI Usage
Generate models from queries:
sidemantic migrator analyze queries/ --output-dir models/
Generate models and rewritten queries:
sidemantic migrator analyze queries/ --output-dir models/ --rewrite
This analyzes all SQL files in queries/ and generates:
- Model YAML files in
models/ - Rewritten queries in
queries_rewritten/(if--rewriteflag used)
Other commands:
# Analyze specific query
sidemantic migrator analyze-query \
"SELECT status, COUNT(*) FROM orders GROUP BY status"
# Generate coverage report
sidemantic migrator report queries/ > coverage-report.txt
Options:
--pattern "*.sql"- File pattern to match--output-dir- Where to write generated models--rewrite- Also generate rewritten queries--verbose- Show detailed analysis
Migration Workflow
1. Analyze queries and generate models
sidemantic migrator analyze queries/ --output-dir models/generated/ --verbose
Review generated models in models/generated/ directory.
2. Generate rewritten queries
sidemantic migrator analyze queries/ --output-dir models/generated/ --rewrite
This creates:
- Model definitions in
models/generated/ - Rewritten queries in
queries_rewritten/
3. Test rewritten queries
Compare results between original and rewritten queries to ensure correctness.
4. Deploy models
cp models/generated/*.yml models/
Load into your semantic layer and start using.
Troubleshooting
Low Coverage
Cause: Missing models, dimensions, or metrics.
Fix: Generate models from your queries first:
sidemantic migrator analyze queries/ --output-dir models/
Then load those models into your semantic layer before re-analyzing.
Dimensions Not Extracted
Cause: No GROUP BY clause.
-- ❌ No dimensions (no GROUP BY)
SELECT COUNT(*) FROM orders
-- ✅ Extracts status
SELECT status, COUNT(*) FROM orders GROUP BY status
Relationships Not Detected
Causes:
- Non-equi joins
- Complex conditions
- Multiple join keys
Fix: Manually add relationships to generated models or use simple equality joins in queries.
Metric Names Don't Match
Analyzer generates names from agg_column:
SUM(amount)→sum_amountCOUNT(*)→count
Use aliases to control names:
SELECT SUM(amount) as total_revenue FROM orders -- Generates "total_revenue"
Python API
Get coverage metrics from report:
report = analyzer.analyze_queries(queries)
print(f"Coverage: {report.coverage_percentage:.1f}%")
print(f"Missing models: {report.missing_models}")
print(f"Missing dimensions: {report.missing_dimensions}")
Provide database connection for better inference:
import duckdb
conn = duckdb.connect(":memory:")
analyzer = Migrator(layer, connection=conn)
Benefits: Resolves ambiguous columns, infers relationships from foreign keys, detects primary keys.