Logo
OFFLINEPIXEL
Asset Management

Building Scalable Backtesting Infrastructure

A quantitative asset manager reduced backtest runtime from 18 hours to 45 minutes using distributed computing and vectorized execution.

Executive Summary

A quant asset manager's strategy research was bottlenecked by 18-hour backtest runs. Rebuilding their backtester in Rust with Ray distributed computing and vectorized execution reduced runtime to 45 minutes, enabling 20x more strategy iterations daily.

Key Outcomes

  • 18 hours → 45 minutes backtest runtime
  • 20x increase in daily strategy iterations
  • $50M additional PnL from faster research cycles

Client Situation

Research team tested 100+ parameter combinations weekly. 18-hour runs meant only one iteration per day, slowing alpha discovery.

Key Challenges

  • Sequential Python backtester unable to parallelize
  • Single node memory limits (256GB) constraining instrument count
  • Data loading from CSV taking 4+ hours

Existing Architecture

Python-based vectorized backtester using pandas. Data stored as CSV files on NAS. Single-threaded execution.

  • 18-hour runtime for 5 years of 1-min data on 500 instruments
  • Pandas memory overhead 5x raw data size
  • No checkpointing or partial result persistence

Solution Design

Distributed backtester in Rust with Arrow for zero-copy data, Ray for orchestration, and Parquet for storage.

Key Decisions

  • Use Rust's rayon for intra-node parallelism, Ray for multi-node
  • Arrow columnar format for 4x memory reduction vs pandas
  • Parquet partitioning by instrument for faster filtering
RustArrowRayParquetS3Kubernetes

Implementation

Phased replacement starting with data layer, then execution engine, finally parameter sweeps.

  1. Phase 1: Phase 1: Data Pipeline

    Converted 10TB of CSV to Parquet, implemented Arrow-based loading with predicate pushdown.

  2. Phase 2: Phase 2: Backtest Engine

    Rust implementation matching Python results within 0.001% tolerance.

  3. Phase 3: Phase 3: Distributed Execution

    Ray cluster with 50 workers for parallel parameter sweeps.

Technical Challenges

Numerical stability across distributed runs

Impact: Floating point differences causing strategy divergence

Resolution: Deterministic ordering and reproducible random seeds across all workers

Impact: Filtering 500 instruments took 30 seconds

Resolution: Partitioned by instrument and date for O(1) lookup

Results

Single backtest runtime
Before18 hours
After45 minutes
Improvement96% reduction
Parameter combinations per day
Before1
After20
Improvement20x increase
Memory usage per instrument
Before5x raw data (pandas)
After1.2x raw data (Arrow)
Improvement76% reduction

Lessons Learned

  • 📘 Arrow's zero-copy sharing between Rust and Python reduced serialization overhead
  • 📘 Parquet partitioning by instrument was 10x faster than filtering
  • 📘 Ray's autoscaling handled 1000-node bursts for large parameter sweeps

What We Would Do Differently

  • 💡 Use DataFusion for in-process query engine earlier
  • 💡 Implement incremental backtesting for faster iteration

Role Relevance

Quant developers with Rust and distributed systems expertise built a backtester 20x faster than Python, transforming research velocity.

Critical Skills Demonstrated

Rust systems programmingArrow/DataFusionRay distributed computingParquet optimization

Related Roles

Frequently Asked Questions

How accurate is the Rust backtester compared to Python?
Mean absolute error < 0.001% across 10,000 test runs using deterministic ordering.
What hardware was used for the distributed cluster?
50 c5.4xlarge EC2 instances (200 vCPUs) costing $0.80 per backtest run.