Executive Summary
An adtech company needed a real-time bidding API processing 100K requests/second with 10ms latency—requirements their existing framework couldn't meet. Python engineers built an asyncio-based service with gRPC, protocol buffers, and zero-copy deserialization, achieving 120K RPS at 8ms P99 latency.
Key Outcomes
- ▹ 100K → 120K requests/second capacity
- ▹ 12ms → 8ms P99 latency
- ▹ Infrastructure cost reduced 40%
Client Situation
The company's real-time bidding API had latency up to 50ms, losing auctions to competitors with sub-10ms responses.
Key Challenges
- ⚠ 50ms P99 latency causing 30% bid loss
- ⚠ JSON serialization/deserialization overhead
- ⚠ Gunicorn workers maxed at 5K RPS per instance
Existing Architecture
Django REST framework with JSON over HTTP, PostgreSQL for ad targeting, deployed on 50 EC2 instances.
- JSON parsing taking 3ms per request
- HTTP/1.1 overhead and connection limits
- Django sync request-response model
Solution Design
asyncio-based gRPC service with Protocol Buffers, connection pooling, and in-memory targeting engine.
Key Decisions
- ✓ gRPC over HTTP/2 for multiplexing and binary protocol
- ✓ Protocol Buffers for zero-copy deserialization
- ✓ in-memory targeting data with lock-free reads
Implementation
Shadow traffic testing for 6 weeks, comparing gRPC bidding decisions against Django baseline.
Phase 1: Phase 1: gRPC Prototype
Built bidding service with asyncio and protocol buffers—achieved 20K RPS at 15ms.
Phase 2: Phase 2: Optimization
Added connection pooling, lock-free data structures—achieved 60K RPS at 10ms.
Phase 3: Phase 3: Production Scaling
Horizontal scaling with client-side load balancing—achieved 120K RPS at 8ms.
Technical Challenges
- gRPC Python performance
Impact: Initial implementation maxed at 15K RPS
Resolution: Switched to asyncio + uvloop + concurrent.futures for CPU-bound tasks
- Memory fragmentation at high throughput
Impact: Service OOM after 1 hour at 100K RPS
Resolution: Pre-allocated message pools + object reuse
Results
- Max throughput per instance
- Before5,000 RPSAfter20,000 RPSImprovement4x increase
- P99 latency
- Before50msAfter8msImprovement84% reduction
- Server instances needed
- Before50After30Improvement40% reduction
Lessons Learned
- 📘 gRPC + Protocol Buffers provided 5x throughput vs JSON over HTTP
- 📘 asyncio with uvloop achieved Go-like performance in Python
- 📘 Connection pooling and object reuse critical for high throughput
What We Would Do Differently
- 💡 Use Cython for performance-critical sections
- 💡 Implement adaptive load shedding earlier
Role Relevance
Python engineers built a gRPC-based API achieving 120K RPS at 8ms latency—performance previously thought impossible in Python.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- Why not use Go or Rust for this use case?
- Team's expertise in Python + gRPC achieved required performance; no rewrite needed.
- What was the biggest performance bottleneck?
- JSON serialization (3ms). Protocol Buffers reduced it to 0.3ms.