← All projects

dlt-iceberg

A dlt destination for loading data into Apache Iceberg tables.

Load data from any source into Apache Iceberg tables using dlt. Supports multiple catalog backends and handles schema evolution automatically.

Installation

uv add dlt-iceberg

Supported Catalogs

  • Nessie: Open-source Git-like version control for data lakes
  • Polaris: Snowflake's open catalog for Apache Iceberg
  • AWS Glue: AWS-managed metadata catalog
  • Databricks Unity Catalog: Unified governance for Databricks

Quick Start with Nessie

import dlt
from dlt_iceberg import iceberg

# Configure the Iceberg destination
destination = iceberg(
    catalog_uri="http://localhost:19120/api/v1",
    catalog_name="nessie",
    warehouse="s3://my-bucket/warehouse"
)

# Create a pipeline
pipeline = dlt.pipeline(
    pipeline_name="github_events",
    destination=destination,
    dataset_name="analytics"
)

# Load data from any dlt source
pipeline.run(my_data, table_name="events")

AWS Glue Configuration

destination = iceberg(
    catalog_type="glue",
    warehouse="s3://my-bucket/warehouse",
    aws_region="us-east-1"
)

Unity Catalog Configuration

destination = iceberg(
    catalog_type="unity",
    catalog_name="main",
    warehouse="abfss://[email protected]/warehouse",
    databricks_host="https://your-workspace.cloud.databricks.com",
    databricks_token="your-token"
)

Features

  • Schema evolution: Automatically handles new columns and type changes
  • Partitioning: Configure time-based or value-based partitions
  • Merge writes: Upsert data with primary keys
  • S3, GCS, Azure: Works with any object storage backend