Isn't Go fast enough for most use cases?

Yes, but Go's GC pauses become problematic at very high throughput (1M+ events/sec) and when tail latency matters (P99 < 10ms).

How did the team learn Rust?

The team spent 4 weeks in intensive training before starting the migration, with pair programming and code reviews.

How does this case study work?

Raise a request, talk to experts, fund the project, expert works, review and approve payment. All remote, all through our platform.

Migrating High-Throughput Services from Go to Rust

Q: How did the team learn Rust?

The team spent 4 weeks in intensive training before starting the migration, with pair programming and code reviews.

Q: How does this case study work?

Raise a request, talk to experts, fund the project, expert works, review and approve payment. All remote, all through our platform.

Executive Summary

An AdTech company processing 2 million events per second faced escalating cloud costs due to Go's memory overhead. Migrating their bid-stream processor to Rust eliminated GC pauses and reduced memory usage by 82% while handling 3x the throughput.

Key Outcomes

▹ 82% reduction in memory usage per pod
▹ 3x increase in throughput on same hardware
▹ Zero GC-related latency spikes

Client Situation

The company operated a real-time bidding system processing billions of daily ad impressions. Their Go microservices were hitting memory limits and experiencing GC pauses during traffic spikes.

Key Challenges

⚠ Go GC causing 5-20ms pauses during peak traffic
⚠ Memory usage per pod exceeding 8GB limits
⚠ Cloud costs doubling year over year

Existing Architecture

Multiple Go services consuming from Kafka, processing bid requests, and writing to ClickHouse. Each service used Go's standard net/http and Kafka client libraries.

Go GC not tunable enough for millisecond latency requirements
High memory overhead for map structures storing bid data
CPU contention between GC and business logic

Solution Design

We targeted the bid-stream enrichment service first—the most memory-intensive component. The Rust implementation used zero-copy deserialization and arena allocation.

Key Decisions

✓ Use rdkafka bindings for Kafka consumer/producer
✓ Implement custom memory pool for bid request objects
✓ Use gRPC with tonic framework for service boundaries

RustrdkafkaTokiotonic-gRPCClickHouse

Implementation

We implemented a strangler pattern, routing 10% of traffic to Rust service and gradually increasing over 3 months.

Phase 1: Phase 1: Core Processor Migration
Rewrote bid enrichment logic in Rust with parallel processing using Rayon.
Phase 2: Phase 2: Kafka Integration
Replaced Go consumer with rdkafka, achieved lower latency at higher throughput.
Phase 3: Phase 3: Full Rollout
100% traffic on Rust service after 6 weeks of production validation.

Technical Challenges

Kafka rebalancing behavior differences

Impact: Consumer group instability during initial rollout

Resolution: Tuned max.poll.interval.ms and used cooperative rebalancing

ClickHouse insert batching

Impact: Throughput bottleneck in database writes

Resolution: Implemented adaptive batching based on buffer size and time windows

Results

Memory per pod: Before7.2 GB
After1.3 GB
Improvement82% reduction
P99 latency (end-to-end): Before45 ms
After18 ms
Improvement60% reduction
Events per second per pod: Before85k
After270k
Improvement3.1x increase

Lessons Learned

📘 Go's simplicity is valuable, but Rust's memory control is unbeatable for predictable low latency
📘 Zero-copy deserialization using serde significantly reduced allocation pressure
📘 Port incrementally, not all at once

What We Would Do Differently

💡 Build better observability into Rust service from day 1 for GC comparison
💡 Use loom for concurrency testing earlier

Role Relevance

Rust engineers brought deep understanding of memory allocation patterns, zero-copy deserialization, and async I/O—essential for achieving 3x throughput improvement.

Critical Skills Demonstrated

Memory managementZero-copy deserialization (serde)Async programming (Tokio)Kafka internalsPerformance benchmarking

Related Roles

Rust Engineer Systems Engineer

Frequently Asked Questions

Isn't Go fast enough for most use cases?: Yes, but Go's GC pauses become problematic at very high throughput (1M+ events/sec) and when tail latency matters (P99 < 10ms).
How did the team learn Rust?: The team spent 4 weeks in intensive training before starting the migration, with pair programming and code reviews.