Executive Summary
An AdTech company processing 2 million events per second faced escalating cloud costs due to Go's memory overhead. Migrating their bid-stream processor to Rust eliminated GC pauses and reduced memory usage by 82% while handling 3x the throughput.
Key Outcomes
- ▹ 82% reduction in memory usage per pod
- ▹ 3x increase in throughput on same hardware
- ▹ Zero GC-related latency spikes
Client Situation
The company operated a real-time bidding system processing billions of daily ad impressions. Their Go microservices were hitting memory limits and experiencing GC pauses during traffic spikes.
Key Challenges
- ⚠ Go GC causing 5-20ms pauses during peak traffic
- ⚠ Memory usage per pod exceeding 8GB limits
- ⚠ Cloud costs doubling year over year
Existing Architecture
Multiple Go services consuming from Kafka, processing bid requests, and writing to ClickHouse. Each service used Go's standard net/http and Kafka client libraries.
- Go GC not tunable enough for millisecond latency requirements
- High memory overhead for map structures storing bid data
- CPU contention between GC and business logic
Solution Design
We targeted the bid-stream enrichment service first—the most memory-intensive component. The Rust implementation used zero-copy deserialization and arena allocation.
Key Decisions
- ✓ Use rdkafka bindings for Kafka consumer/producer
- ✓ Implement custom memory pool for bid request objects
- ✓ Use gRPC with tonic framework for service boundaries
Implementation
We implemented a strangler pattern, routing 10% of traffic to Rust service and gradually increasing over 3 months.
Phase 1: Phase 1: Core Processor Migration
Rewrote bid enrichment logic in Rust with parallel processing using Rayon.
Phase 2: Phase 2: Kafka Integration
Replaced Go consumer with rdkafka, achieved lower latency at higher throughput.
Phase 3: Phase 3: Full Rollout
100% traffic on Rust service after 6 weeks of production validation.
Technical Challenges
- Kafka rebalancing behavior differences
Impact: Consumer group instability during initial rollout
Resolution: Tuned max.poll.interval.ms and used cooperative rebalancing
- ClickHouse insert batching
Impact: Throughput bottleneck in database writes
Resolution: Implemented adaptive batching based on buffer size and time windows
Results
- Memory per pod
- Before7.2 GBAfter1.3 GBImprovement82% reduction
- P99 latency (end-to-end)
- Before45 msAfter18 msImprovement60% reduction
- Events per second per pod
- Before85kAfter270kImprovement3.1x increase
Lessons Learned
- 📘 Go's simplicity is valuable, but Rust's memory control is unbeatable for predictable low latency
- 📘 Zero-copy deserialization using serde significantly reduced allocation pressure
- 📘 Port incrementally, not all at once
What We Would Do Differently
- 💡 Build better observability into Rust service from day 1 for GC comparison
- 💡 Use loom for concurrency testing earlier
Role Relevance
Rust engineers brought deep understanding of memory allocation patterns, zero-copy deserialization, and async I/O—essential for achieving 3x throughput improvement.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- Isn't Go fast enough for most use cases?
- Yes, but Go's GC pauses become problematic at very high throughput (1M+ events/sec) and when tail latency matters (P99 < 10ms).
- How did the team learn Rust?
- The team spent 4 weeks in intensive training before starting the migration, with pair programming and code reviews.