Cloud Warehouse to DuckDB Cost Optimization
A guide to migrating analytical workloads from cloud warehouses (Snowflake, BigQuery) to DuckDB for 90% cost reduction.
Executive Summary
A startup's Snowflake bill reached $200k/year for 50TB of data. Over 3 months, they migrated to DuckDB + Parquet on S3, reducing costs to $20k/year (90% savings) while maintaining query performance for 80% of workloads. This guide covers data export, Parquet optimization, and query migration strategies.
Why Migrate from Cloud Warehouse
Snowflake costs grew linearly with data and queries. At 50TB and 100K queries/month, the bill reached $200k/year. Most queries were recurring dashboards, not ad-hoc.
- → $200k/year Snowflake bill (growing 20% quarterly)
- → Unpredictable costs (surprise $30k months)
- → 80% of queries were recurring dashboards
- → Cold data still costing $$ (no tiered storage)
DuckDB Cost Optimization Readiness
The team spent 1 month planning: identifying candidate tables (80% of data), setting up Parquet export, and proving performance.
- • Query audit (20% ad-hoc, 80% recurring)
- • Parquet export pipeline (Snowflake → S3)
- • DuckDB on EC2 (cost $200/month per instance)
- • BI tool integration (Tableau, Superset)
- • Cost baseline (current Snowflake bill)
Snowflake Assessment
Snowflake stored 50TB of data, with 200 tables and 500 recurring queries. The biggest cost drivers were full table scans and large warehouses (XL).
Technical Debt
- • No query cost awareness (engineers run expensive queries)
- • Recurring queries not cached or optimized
- • Cold data in same tier as hot data
- • Compute cost 80% of bill
Risks
- • DuckDB concurrency limits (single-node vs Snowflake
- • Data freshness (daily Parquet export = 24-hour lag)
- • Query performance for complex joins (must partition properly)
- • Team resistance (familiar with Snowflake SQL)
Target DuckDB + S3 Architecture
The target was Parquet files on S3 queried by DuckDB, refreshed daily.
3-Month Cost Optimization Migration
Step 1: Phase 1: Parquet Export (Month 1)
Exported 40TB of data (80%) to Parquet on S3 (daily sync).
Step 2: Phase 2: Query Migration (Month 2)
Migrated 400 recurring dashboards to DuckDB—tested performance.
Step 3: Phase 3: Cutover (Month 3)
Switched BI tools to DuckDB for 80% of queries, Snowflake for 20%.
Snowflake to Parquet Export
Daily export of 40TB of cold and warm data to Parquet using Snowflake's COPY INTO.
- • COPY INTO command (Snowflake → S3 Parquet)
- • Partitioning by date (year/month/day)
- • Compression (10x size reduction)
- • Incremental exports (only new/changed data)
Common Cloud Warehouse Migration Mistakes
Moving real-time dashboards (need sub-minute data)
Impact: 24-hour lag unacceptable (daily Parquet export)
Prevention: Keep real-time dashboards on Snowflake
No Parquet partitioning
Impact: DuckDB scans entire 10TB dataset (slow)
Prevention: Partition by date (WHERE date='2024-01-01')
Single EC2 instance bottleneck
Impact: 20 concurrent queries slow (queueing)
Prevention: Multiple DuckDB replicas, load balancer
Migrating all queries (100%)
Impact: 2-month delay, team frustrated
Prevention: 80/20 rule: 80% to DuckDB, 20% stay
Migration Success Metrics
Who Should Lead Cost Optimization Migration
Recommended Roles
Required Experience
- • Snowflake or BigQuery administration
- • Parquet and columnar storage
- • ETL pipelines (Airflow)
- • Cost optimization experience
Related Roles
Frequently Asked Questions
- Can DuckDB replace Snowflake completely?
- No—keep Snowflake for ad-hoc, real-time, and high-concurrency. DuckDB for recurring dashboards and batch analytics.
- What about data governance (RBAC)?
- DuckDB has no built-in RBAC. Implement at BI tool level or use Snowflake for sensitive data.
- How many concurrent queries can DuckDB handle?
- 20-50 on single node. For more, use read replicas or keep Snowflake for high-concurrency.