Sidequery

Migrator

Migrate existing SQL queries to semantic layer by analyzing queries and generating model definitions.

Migrates existing SQL queries to semantic layer by analyzing queries and generating model definitions.

Key features:

  1. Generate model definitions from existing queries
  2. Rewrite queries to use semantic layer syntax
  3. Measure what's missing from your semantic layer

Quick Start

Generate models from your existing queries:

sidemantic migrator analyze queries/ --output-dir models/

This analyzes SQL files in queries/ and generates model YAML files in models/.

Example output models/orders.yml:

model:
  name: orders
  table: orders
dimensions:
  - name: status
    sql: status
  - name: region
    sql: region
metrics:
  - name: sum_amount
    agg: sum
    sql: amount
  - name: count
    agg: count
    sql: "*"

Add --rewrite to also generate rewritten queries:

sidemantic migrator analyze queries/ --output-dir models/ --rewrite

Original query:

SELECT status, region, SUM(amount), COUNT(*)
FROM orders
WHERE status = 'completed'
GROUP BY status, region

Rewritten query:

SELECT orders.status, orders.region, orders.sum_amount, orders.count
FROM orders
WHERE status = 'completed'

Supported Patterns

Basic Aggregations

Extracts dimensions from GROUP BY and metrics from aggregations:

SELECT
    status,
    SUM(amount) as revenue,
    COUNT(DISTINCT customer_id) as customers
FROM orders
GROUP BY status

Complex Expressions

Handles CASE, COALESCE, CAST, math, string functions:

SELECT
    UPPER(status) as status,
    SUM(CASE WHEN priority = 'high' THEN amount ELSE 0 END) as high_priority_revenue,
    SUM(quantity * price) as total_revenue
FROM orders
GROUP BY UPPER(status)

Extracts underlying columns (status, amount, quantity, price).

Time Dimensions

SELECT
    DATE_TRUNC('month', order_date) as month,
    EXTRACT(YEAR FROM order_date) as year,
    SUM(amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date), EXTRACT(YEAR FROM order_date)

Extracts order_date as time dimension. Rewrites to orders.order_date__month, orders.order_date__year.

Derived Metrics

SELECT
    status,
    SUM(revenue) / COUNT(*) as avg_revenue_per_order,
    SUM(amount - discount) as net_amount
FROM orders
GROUP BY status

Extracts base metrics and derived metrics separately:

metrics:
  - name: sum_revenue
    agg: sum
    sql: revenue
  - name: count
    agg: count
    sql: "*"
  - name: avg_revenue_per_order
    type: derived
    sql: "SUM(revenue) / COUNT(*)"

Joins and Relationships

SELECT
    c.region,
    COUNT(o.order_id) as order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.region

Extracts relationship: orders.customer_idcustomers.id (many_to_one).

# orders.yml
relationships:
  - name: customers
    type: many_to_one
    foreign_key: customer_id

Subqueries

Resolves subquery aliases to underlying tables:

SELECT
    sub.status,
    COUNT(*) as order_count
FROM (
    SELECT status, amount
    FROM orders
    WHERE amount > 100
) sub
GROUP BY sub.status

Extracts orders table and resolves sub.statusorders.status.

Window Functions and Cumulative Metrics

Detects window functions and generates cumulative metric definitions.

Running totals:

SELECT
    order_date,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders
metrics:
  - name: running_total
    type: cumulative
    sql: "orders.sum_amount"

Rolling windows:

SELECT
    order_date,
    SUM(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as rolling_7day
FROM orders
metrics:
  - name: rolling_7day
    type: cumulative
    sql: "orders.sum_amount"
    window: "6 days"

Period-to-date:

SELECT
    order_date,
    SUM(amount) OVER (
        PARTITION BY DATE_TRUNC('month', order_date)
        ORDER BY order_date
    ) as mtd_revenue
FROM orders
metrics:
  - name: mtd_revenue
    type: cumulative
    sql: "orders.sum_amount"
    grain_to_date: "month"

What's extracted:

  • Aggregation window functions: SUM/AVG/COUNT/MIN/MAX with OVER()
  • Rolling windows with ROWS BETWEEN
  • Period-to-date with PARTITION BY DATE_TRUNC()

What's ignored:

  • Window-only functions: ROW_NUMBER(), RANK(), LAG(), LEAD()
  • Complex window calculations like COUNT(*) * 100.0 / SUM(COUNT(*)) OVER()

Base aggregations inside complex calculations are still extracted as regular metrics.

Other Patterns

CTEs: Extracts from underlying tables, preserves structure in rewrites

GROUP BY ordinals: GROUP BY 1, 2 resolved to columns

Implicit joins: FROM a, b WHERE a.id = b.fk extracts relationships

Self-joins: Generates single model

CLI Usage

Generate models from queries:

sidemantic migrator analyze queries/ --output-dir models/

Generate models and rewritten queries:

sidemantic migrator analyze queries/ --output-dir models/ --rewrite

This analyzes all SQL files in queries/ and generates:

  • Model YAML files in models/
  • Rewritten queries in queries_rewritten/ (if --rewrite flag used)

Other commands:

# Analyze specific query
sidemantic migrator analyze-query \
    "SELECT status, COUNT(*) FROM orders GROUP BY status"

# Generate coverage report
sidemantic migrator report queries/ > coverage-report.txt

Options:

  • --pattern "*.sql" - File pattern to match
  • --output-dir - Where to write generated models
  • --rewrite - Also generate rewritten queries
  • --verbose - Show detailed analysis

Migration Workflow

1. Analyze queries and generate models

sidemantic migrator analyze queries/ --output-dir models/generated/ --verbose

Review generated models in models/generated/ directory.

2. Generate rewritten queries

sidemantic migrator analyze queries/ --output-dir models/generated/ --rewrite

This creates:

  • Model definitions in models/generated/
  • Rewritten queries in queries_rewritten/

3. Test rewritten queries

Compare results between original and rewritten queries to ensure correctness.

4. Deploy models

cp models/generated/*.yml models/

Load into your semantic layer and start using.

Troubleshooting

Low Coverage

Cause: Missing models, dimensions, or metrics.

Fix: Generate models from your queries first:

sidemantic migrator analyze queries/ --output-dir models/

Then load those models into your semantic layer before re-analyzing.

Dimensions Not Extracted

Cause: No GROUP BY clause.

-- ❌ No dimensions (no GROUP BY)
SELECT COUNT(*) FROM orders

-- ✅ Extracts status
SELECT status, COUNT(*) FROM orders GROUP BY status

Relationships Not Detected

Causes:

  • Non-equi joins
  • Complex conditions
  • Multiple join keys

Fix: Manually add relationships to generated models or use simple equality joins in queries.

Metric Names Don't Match

Analyzer generates names from agg_column:

  • SUM(amount)sum_amount
  • COUNT(*)count

Use aliases to control names:

SELECT SUM(amount) as total_revenue FROM orders  -- Generates "total_revenue"

Python API

Get coverage metrics from report:

report = analyzer.analyze_queries(queries)

print(f"Coverage: {report.coverage_percentage:.1f}%")
print(f"Missing models: {report.missing_models}")
print(f"Missing dimensions: {report.missing_dimensions}")

Provide database connection for better inference:

import duckdb

conn = duckdb.connect(":memory:")
analyzer = Migrator(layer, connection=conn)

Benefits: Resolves ambiguous columns, infers relationships from foreign keys, detects primary keys.