What FPGA card and what was the cost?

Xilinx Alveo U250 deployed on 8 servers, $40k total hardware cost generating $15M annual PnL.

How do you update routing models?

Software trains logistic regression models every minute, writes weights to FPGA via PCIe DMA.

How does this case study work?

Raise a request, talk to experts, fund the project, expert works, review and approve payment. All remote, all through our platform.

Optimizing Order Routing for High-Frequency Trading

Executive Summary

A high-frequency trading firm was losing millions in latency arbitrage opportunities due to slow order routing. By rebuilding their smart order router with FPGA acceleration and predictive fill models, they achieved 9.5μs routing decisions and captured 34% more liquidity.

Key Outcomes

▹ 89% reduction in routing latency (85μs → 9.5μs)
▹ 34% improvement in fill rates
▹ $15M additional annual PnL

Client Situation

The firm traded 2M orders daily across 15 exchanges. Their software-based router consistently lost to competitors with FPGA-accelerated infrastructure.

Key Challenges

⚠ Software routing latency of 85μs causing missed fills
⚠ Static venue weights ignoring real-time queue positions
⚠ Inability to participate in colocation-sensitive events

Existing Architecture

C++ software router with TCP connections to venues. Round-robin distribution with simple venue ranking.

85μs decision latency (85,000ns) too slow for HFT
Context switches and syscalls dominating latency budget
No visibility into venue-specific queue depths

Solution Design

FPGA-based router with hardware-accelerated decision logic, using real-time queue monitoring and predictive fill models.

Key Decisions

✓ Implement routing logic in Verilog on Xilinx Alveo cards
✓ Use hardware timestamping for latency measurement
✓ Predictive model using logistic regression on live queue data

FPGAVerilogC++RustPCAPUDP

Implementation

Shadow deployment for 2 months comparing FPGA decisions against software baseline before live rollout.

Phase 1: Phase 1: FPGA Prototype
Implemented basic routing logic achieving 12μs latency, validated against software router.
Phase 2: Phase 2: Model Integration
Added fill probability model using logistic regression weights precomputed in software.
Phase 3: Phase 3: Production Deployment
Phased rollout starting with 1 venue, expanding to all 15 over 3 months.

Technical Challenges

FPGA memory for queue depth tracking

Impact: Could only track 5 venues before running out of BRAM

Resolution: Compressed queue state using 8-bit relative sizes instead of 32-bit absolute

Model update latency

Impact: 6-hour retraining window using stale weights

Resolution: Online learning with 1-minute model updates via PCIe DMA

Results

Routing decision latency (P99): Before85μs
After9.5μs
Improvement89% reduction
Fill rate (first venue): Before47%
After63%
Improvement34% increase
Orders routed per second: Before50,000
After500,000
Improvement10x increase

Lessons Learned

📘 FPGA decision latency below 10μs is table stakes for modern HFT
📘 Online model updates critical for adapting to liquidity shifts
📘 Hardware-software co-design required for maintainability

What We Would Do Differently

💡 Use HLS instead of Verilog for faster iteration
💡 Implement reinforcement learning directly on FPGA

Role Relevance

Quant developers with FPGA experience and market microstructure knowledge were essential for sub-10μs smart order routing that competitors couldn't match.

Critical Skills Demonstrated

FPGA development (Verilog/HLS)Market microstructureHardware timestampingLatency measurement and profiling

Frequently Asked Questions

What FPGA card and what was the cost?: Xilinx Alveo U250 deployed on 8 servers, $40k total hardware cost generating $15M annual PnL.
How do you update routing models?: Software trains logistic regression models every minute, writes weights to FPGA via PCIe DMA.