Executive Summary
A high-frequency trading firm was losing millions in latency arbitrage opportunities due to slow order routing. By rebuilding their smart order router with FPGA acceleration and predictive fill models, they achieved 9.5μs routing decisions and captured 34% more liquidity.
Key Outcomes
- ▹ 89% reduction in routing latency (85μs → 9.5μs)
- ▹ 34% improvement in fill rates
- ▹ $15M additional annual PnL
Client Situation
The firm traded 2M orders daily across 15 exchanges. Their software-based router consistently lost to competitors with FPGA-accelerated infrastructure.
Key Challenges
- ⚠ Software routing latency of 85μs causing missed fills
- ⚠ Static venue weights ignoring real-time queue positions
- ⚠ Inability to participate in colocation-sensitive events
Existing Architecture
C++ software router with TCP connections to venues. Round-robin distribution with simple venue ranking.
- 85μs decision latency (85,000ns) too slow for HFT
- Context switches and syscalls dominating latency budget
- No visibility into venue-specific queue depths
Solution Design
FPGA-based router with hardware-accelerated decision logic, using real-time queue monitoring and predictive fill models.
Key Decisions
- ✓ Implement routing logic in Verilog on Xilinx Alveo cards
- ✓ Use hardware timestamping for latency measurement
- ✓ Predictive model using logistic regression on live queue data
Implementation
Shadow deployment for 2 months comparing FPGA decisions against software baseline before live rollout.
Phase 1: Phase 1: FPGA Prototype
Implemented basic routing logic achieving 12μs latency, validated against software router.
Phase 2: Phase 2: Model Integration
Added fill probability model using logistic regression weights precomputed in software.
Phase 3: Phase 3: Production Deployment
Phased rollout starting with 1 venue, expanding to all 15 over 3 months.
Technical Challenges
- FPGA memory for queue depth tracking
Impact: Could only track 5 venues before running out of BRAM
Resolution: Compressed queue state using 8-bit relative sizes instead of 32-bit absolute
- Model update latency
Impact: 6-hour retraining window using stale weights
Resolution: Online learning with 1-minute model updates via PCIe DMA
Results
- Routing decision latency (P99)
- Before85μsAfter9.5μsImprovement89% reduction
- Fill rate (first venue)
- Before47%After63%Improvement34% increase
- Orders routed per second
- Before50,000After500,000Improvement10x increase
Lessons Learned
- 📘 FPGA decision latency below 10μs is table stakes for modern HFT
- 📘 Online model updates critical for adapting to liquidity shifts
- 📘 Hardware-software co-design required for maintainability
What We Would Do Differently
- 💡 Use HLS instead of Verilog for faster iteration
- 💡 Implement reinforcement learning directly on FPGA
Role Relevance
Quant developers with FPGA experience and market microstructure knowledge were essential for sub-10μs smart order routing that competitors couldn't match.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- What FPGA card and what was the cost?
- Xilinx Alveo U250 deployed on 8 servers, $40k total hardware cost generating $15M annual PnL.
- How do you update routing models?
- Software trains logistic regression models every minute, writes weights to FPGA via PCIe DMA.