Executive Summary
A cryptocurrency exchange had experienced three critical outages in 18 months due to memory corruption bugs in their C++ order matching engine. After rewriting the engine in Rust, they eliminated entire classes of vulnerabilities and improved system reliability to 99.999% uptime.
Key Outcomes
- ▹ Zero memory safety incidents in 12 months
- ▹ 99.999% uptime achieved after migration
- ▹ Audit costs reduced by 60%
Client Situation
The exchange processed $2B daily volume. Their C++ codebase had accumulated technical debt over 6 years, including buffer overflows and use-after-free bugs discovered during security audits.
Key Challenges
- ⚠ Three critical outages caused by memory corruption
- ⚠ Annual security audit identifying 15+ memory safety issues
- ⚠ Difficulty recruiting C++ engineers with security focus
Existing Architecture
A monolithic C++ application handling order matching, risk management, and trade settlement. The codebase used raw pointers and custom memory pools.
- Buffer overflows in network parsing logic
- Use-after-free in order book maintenance
- Race conditions in concurrent order processing
Solution Design
We rewrote the order matching engine in Rust, preserving C++ for non-critical peripheral systems during the transition.
Key Decisions
- ✓ Model order book state using Rust's ownership system to prevent double booking
- ✓ Use Rust's type system to encode trading rules at compile time
- ✓ Implement WebAssembly for customer strategy sandboxing
Implementation
A complete rewrite over 8 months with parallel testing against production C++ engine for 3 months before cutover.
Phase 1: Phase 1: Order Book Core
Implemented price-time priority order book with Rust's BTreeMap for O(log n) operations.
Phase 2: Phase 2: Matching Engine
Added fill logic and trade generation with property-based testing.
Phase 3: Phase 3: Integration
Integrated with existing risk and settlement systems via gRPC.
Technical Challenges
- Maintaining C++ ABI compatibility
Impact: Risk during phase-out of C++ components
Resolution: Used CXX for safe bidirectional bindings with C++
- Auditing Rust unsafe code blocks
Impact: Small amount of unsafe needed for FFI and performance
Resolution: Formalized review process for every unsafe block with safety comments
Results
- Memory safety vulnerabilities
- Before12-15 per auditAfter0Improvement100% elimination
- System uptime
- Before99.95%After99.999%Improvement5x reduction in downtime
- Order matching latency (P99)
- Before850 microsecondsAfter320 microsecondsImprovement62% reduction
Lessons Learned
- 📘 Rust's ownership model directly maps to financial invariants like 'an order cannot be double-filled'
- 📘 Property-based testing with proptest caught edge cases missed by unit tests
- 📘 The borrow checker prevented real bugs that existed in C++ version
What We Would Do Differently
- 💡 Integrate differential testing earlier to find behavioral mismatches
- 💡 Use criterion benchmarks to ensure no performance regression from day 1
Role Relevance
Rust engineers were essential to safely rewrite a mission-critical financial system, understanding both low-level memory safety and domain-specific trading invariants.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- Why not just fix the C++ bugs instead of rewriting?
- The root cause was C++'s inherent memory unsafety. Fixing individual bugs didn't prevent new ones, and audits kept finding more each year.
- Was rewriting worth the 8-month investment?
- Yes, the exchange saved 3x the rewrite cost in prevented outages and reduced audit fees within the first year.