Logo
OFFLINEPIXEL
Snowflake / BigQuery → DuckDB + S3

Cloud Warehouse to DuckDB Cost Optimization

A guide to migrating analytical workloads from cloud warehouses (Snowflake, BigQuery) to DuckDB for 90% cost reduction.

Snowflake / BigQuery → DuckDB + S3 Big Bang MEDIUM Difficulty

Cloud Warehouse to DuckDB Cost Optimization

A guide to migrating analytical workloads from cloud warehouses (Snowflake, BigQuery) to DuckDB for 90% cost reduction.

Estimated Timeline2-4 months
Primary Roleduckdb-engineer

Executive Summary

A startup's Snowflake bill reached $200k/year for 50TB of data. Over 3 months, they migrated to DuckDB + Parquet on S3, reducing costs to $20k/year (90% savings) while maintaining query performance for 80% of workloads. This guide covers data export, Parquet optimization, and query migration strategies.

Parquet + DuckDB for fixed-cost analytics (no per-query charges)
80/20 rule: 80% of queries (recurring dashboards) can move to DuckDB
Hybrid architecture: keep cloud warehouse for 20% ad-hoc queries
Data partitioning critical for query performance

Why Migrate from Cloud Warehouse

Snowflake costs grew linearly with data and queries. At 50TB and 100K queries/month, the bill reached $200k/year. Most queries were recurring dashboards, not ad-hoc.

  • $200k/year Snowflake bill (growing 20% quarterly)
  • Unpredictable costs (surprise $30k months)
  • 80% of queries were recurring dashboards
  • Cold data still costing $$ (no tiered storage)

DuckDB Cost Optimization Readiness

The team spent 1 month planning: identifying candidate tables (80% of data), setting up Parquet export, and proving performance.

  • Query audit (20% ad-hoc, 80% recurring)
  • Parquet export pipeline (Snowflake → S3)
  • DuckDB on EC2 (cost $200/month per instance)
  • BI tool integration (Tableau, Superset)
  • Cost baseline (current Snowflake bill)

Snowflake Assessment

Snowflake stored 50TB of data, with 200 tables and 500 recurring queries. The biggest cost drivers were full table scans and large warehouses (XL).

Technical Debt

  • • No query cost awareness (engineers run expensive queries)
  • • Recurring queries not cached or optimized
  • • Cold data in same tier as hot data
  • • Compute cost 80% of bill

Risks

  • • DuckDB concurrency limits (single-node vs Snowflake
  • • Data freshness (daily Parquet export = 24-hour lag)
  • • Query performance for complex joins (must partition properly)
  • • Team resistance (familiar with Snowflake SQL)

Target DuckDB + S3 Architecture

The target was Parquet files on S3 queried by DuckDB, refreshed daily.

Parquet on S3 (50TB → 10TB Parquet, 80% compression)DuckDB on EC2 (c5.4xlarge, $200/month)Daily sync from Snowflake to S3 (Airflow)BI tools (Tableau via DuckDB connector)Snowflake retained for 20% ad-hoc queries

3-Month Cost Optimization Migration

  1. Step 1: Phase 1: Parquet Export (Month 1)

    Exported 40TB of data (80%) to Parquet on S3 (daily sync).

  2. Step 2: Phase 2: Query Migration (Month 2)

    Migrated 400 recurring dashboards to DuckDB—tested performance.

  3. Step 3: Phase 3: Cutover (Month 3)

    Switched BI tools to DuckDB for 80% of queries, Snowflake for 20%.

Snowflake to Parquet Export

Daily export of 40TB of cold and warm data to Parquet using Snowflake's COPY INTO.

  • COPY INTO command (Snowflake → S3 Parquet)
  • Partitioning by date (year/month/day)
  • Compression (10x size reduction)
  • Incremental exports (only new/changed data)

Common Cloud Warehouse Migration Mistakes

Moving real-time dashboards (need sub-minute data)

Impact: 24-hour lag unacceptable (daily Parquet export)

Prevention: Keep real-time dashboards on Snowflake

No Parquet partitioning

Impact: DuckDB scans entire 10TB dataset (slow)

Prevention: Partition by date (WHERE date='2024-01-01')

Single EC2 instance bottleneck

Impact: 20 concurrent queries slow (queueing)

Prevention: Multiple DuckDB replicas, load balancer

Migrating all queries (100%)

Impact: 2-month delay, team frustrated

Prevention: 80/20 rule: 80% to DuckDB, 20% stay

Migration Success Metrics

Annual cost: $200k → $20k (90% reduction)
Query performance: 2-3s (same as Snowflake)
Data freshness: 24 hours (acceptable)
User adoption: 80% of dashboards moved

Who Should Lead Cost Optimization Migration

Recommended Roles

Data Engineer (5+ years)Cloud Architect (Snowflake/BigQuery)DuckDB Specialist

Required Experience

  • Snowflake or BigQuery administration
  • Parquet and columnar storage
  • ETL pipelines (Airflow)
  • Cost optimization experience

Related Roles

Frequently Asked Questions

Can DuckDB replace Snowflake completely?
No—keep Snowflake for ad-hoc, real-time, and high-concurrency. DuckDB for recurring dashboards and batch analytics.
What about data governance (RBAC)?
DuckDB has no built-in RBAC. Implement at BI tool level or use Snowflake for sensitive data.
How many concurrent queries can DuckDB handle?
20-50 on single node. For more, use read replicas or keep Snowflake for high-concurrency.