Logo
OFFLINEPIXEL
Asset Management / Hedge Fund

Scaling Portfolio Analytics in Real-Time

A multi-strategy fund scaled risk analytics from daily to sub-second latency, processing 10M position updates per second with streaming architecture.

Executive Summary

A multi-strategy fund with $50B AUM calculated portfolio analytics daily, causing stale risk decisions. Real-time streaming architecture using Rust and ClickHouse reduced analytics latency from 24 hours to 500ms, enabling intraday risk management.

Key Outcomes

  • 24 hours → 500ms analytics latency
  • 10M position updates/sec processing
  • 3 intraday risk events prevented ($20M saved)

Client Situation

The fund's risk team received P&L and exposure reports 12 hours after market close, too late for intraday position adjustments.

Key Challenges

  • Batch processing took 4+ hours for Greeks and VAR
  • Unable to monitor real-time exposure across 50k positions
  • Risk breaches detected only after market close

Existing Architecture

End-of-day batch job in Python reading from SQL Server, calculating Greeks and VAR using NumPy.

  • Batch window impossible to reduce below 4 hours
  • No support for intraday position changes
  • Python single-threaded CPU bottleneck

Solution Design

Streaming architecture with Kafka for position updates, Rust for risk calculation, ClickHouse for real-time storage.

Key Decisions

  • Use Rust for parallel risk calculation across 50k positions
  • ClickHouse with materialized views for pre-aggregated analytics
  • WebSocket push to risk dashboard
RustKafkaClickHouseArrowWebSocket

Implementation

Phased migration starting with P&L, then Greeks, finally VAR. Shadow mode for 1 month.

  1. Phase 1: Phase 1: Real-time P&L

    Built streaming P&L calculator matching batch results within 0.01%.

  2. Phase 2: Phase 2: Greeks Engine

    Implemented delta/gamma/vega calculations using Rust's ndarray.

  3. Phase 3: Phase 3: VAR Dashboard

    Built risk dashboard with historical simulation VAR at 5-minute intervals.

Technical Challenges

Memory pressure for 50k positions

Impact: Full covariance matrix too large for single server

Resolution: Implemented factor model reducing dimension from 50k to 200

Backpressure during volatility

Impact: Kafka lag reaching hours during market stress

Resolution: Added priority queuing for high-touch positions

Results

Risk analytics latency
Before24 hours
After500 ms
Improvement99.994% reduction
Position updates processed/sec
Before500 (batch)
After10M
Improvement20,000x increase
Risk breaches caught intraday
Before0
After3 (in first 6 months)
ImprovementPrevented $20M losses

Lessons Learned

  • 📘 Rust's memory safety caught 12 concurrency bugs that would have corrupted risk state
  • 📘 ClickHouse materialized views reduced query latency from 5s to 50ms
  • 📘 Factor model essential for memory scalability

What We Would Do Differently

  • 💡 Implement checkpoints for state recovery earlier
  • 💡 Use DataFusion for in-process query engine

Role Relevance

Quant engineers built the high-performance risk engine, balancing numerical accuracy with real-time constraints at 10M updates/sec.

Critical Skills Demonstrated

Risk analytics (Greeks, VAR)High-performance computingStreaming architecturesFactor models

Related Roles

Frequently Asked Questions

How accurate is real-time VAR compared to end-of-day?
99.9% correlation with daily VAR using identical parameters and 10-minute windows.
What's the hardware footprint?
6 servers with 256GB RAM each, down from 20 servers in batch architecture.