Why not use Go or Rust for this use case?

Team's expertise in Python + gRPC achieved required performance; no rewrite needed.

What was the biggest performance bottleneck?

JSON serialization (3ms). Protocol Buffers reduced it to 0.3ms.

How does this case study work?

Raise a request, talk to experts, fund the project, expert works, review and approve payment. All remote, all through our platform.

Building High-Throughput API Platforms

Executive Summary

An adtech company needed a real-time bidding API processing 100K requests/second with 10ms latency—requirements their existing framework couldn't meet. Python engineers built an asyncio-based service with gRPC, protocol buffers, and zero-copy deserialization, achieving 120K RPS at 8ms P99 latency.

Key Outcomes

▹ 100K → 120K requests/second capacity
▹ 12ms → 8ms P99 latency
▹ Infrastructure cost reduced 40%

Client Situation

The company's real-time bidding API had latency up to 50ms, losing auctions to competitors with sub-10ms responses.

Key Challenges

⚠ 50ms P99 latency causing 30% bid loss
⚠ JSON serialization/deserialization overhead
⚠ Gunicorn workers maxed at 5K RPS per instance

Existing Architecture

Django REST framework with JSON over HTTP, PostgreSQL for ad targeting, deployed on 50 EC2 instances.

JSON parsing taking 3ms per request
HTTP/1.1 overhead and connection limits
Django sync request-response model

Solution Design

asyncio-based gRPC service with Protocol Buffers, connection pooling, and in-memory targeting engine.

Key Decisions

✓ gRPC over HTTP/2 for multiplexing and binary protocol
✓ Protocol Buffers for zero-copy deserialization
✓ in-memory targeting data with lock-free reads

PythongRPCProtocol BuffersasyncioRedisKafka

Implementation

Shadow traffic testing for 6 weeks, comparing gRPC bidding decisions against Django baseline.

Phase 1: Phase 1: gRPC Prototype
Built bidding service with asyncio and protocol buffers—achieved 20K RPS at 15ms.
Phase 2: Phase 2: Optimization
Added connection pooling, lock-free data structures—achieved 60K RPS at 10ms.
Phase 3: Phase 3: Production Scaling
Horizontal scaling with client-side load balancing—achieved 120K RPS at 8ms.

Technical Challenges

gRPC Python performance

Impact: Initial implementation maxed at 15K RPS

Resolution: Switched to asyncio + uvloop + concurrent.futures for CPU-bound tasks

Memory fragmentation at high throughput

Impact: Service OOM after 1 hour at 100K RPS

Resolution: Pre-allocated message pools + object reuse

Results

Max throughput per instance: Before5,000 RPS
After20,000 RPS
Improvement4x increase
P99 latency: Before50ms
After8ms
Improvement84% reduction
Server instances needed: Before50
After30
Improvement40% reduction

Lessons Learned

📘 gRPC + Protocol Buffers provided 5x throughput vs JSON over HTTP
📘 asyncio with uvloop achieved Go-like performance in Python
📘 Connection pooling and object reuse critical for high throughput

What We Would Do Differently

💡 Use Cython for performance-critical sections
💡 Implement adaptive load shedding earlier

Role Relevance

Python engineers built a gRPC-based API achieving 120K RPS at 8ms latency—performance previously thought impossible in Python.

Critical Skills Demonstrated

asyncio/uvloop optimizationgRPC and Protocol BuffersHigh-throughput patternsMemory optimization

Frequently Asked Questions

Why not use Go or Rust for this use case?: Team's expertise in Python + gRPC achieved required performance; no rewrite needed.
What was the biggest performance bottleneck?: JSON serialization (3ms). Protocol Buffers reduced it to 0.3ms.