Logo
OFFLINEPIXEL
AdTech

Building High-Throughput API Platforms

Executive Summary

An adtech company needed a real-time bidding API processing 100K requests/second with 10ms latency—requirements their existing framework couldn't meet. Python engineers built an asyncio-based service with gRPC, protocol buffers, and zero-copy deserialization, achieving 120K RPS at 8ms P99 latency.

Key Outcomes

  • 100K → 120K requests/second capacity
  • 12ms → 8ms P99 latency
  • Infrastructure cost reduced 40%

Client Situation

The company's real-time bidding API had latency up to 50ms, losing auctions to competitors with sub-10ms responses.

Key Challenges

  • 50ms P99 latency causing 30% bid loss
  • JSON serialization/deserialization overhead
  • Gunicorn workers maxed at 5K RPS per instance

Existing Architecture

Django REST framework with JSON over HTTP, PostgreSQL for ad targeting, deployed on 50 EC2 instances.

  • JSON parsing taking 3ms per request
  • HTTP/1.1 overhead and connection limits
  • Django sync request-response model

Solution Design

asyncio-based gRPC service with Protocol Buffers, connection pooling, and in-memory targeting engine.

Key Decisions

  • gRPC over HTTP/2 for multiplexing and binary protocol
  • Protocol Buffers for zero-copy deserialization
  • in-memory targeting data with lock-free reads
PythongRPCProtocol BuffersasyncioRedisKafka

Implementation

Shadow traffic testing for 6 weeks, comparing gRPC bidding decisions against Django baseline.

  1. Phase 1: Phase 1: gRPC Prototype

    Built bidding service with asyncio and protocol buffers—achieved 20K RPS at 15ms.

  2. Phase 2: Phase 2: Optimization

    Added connection pooling, lock-free data structures—achieved 60K RPS at 10ms.

  3. Phase 3: Phase 3: Production Scaling

    Horizontal scaling with client-side load balancing—achieved 120K RPS at 8ms.

Technical Challenges

gRPC Python performance

Impact: Initial implementation maxed at 15K RPS

Resolution: Switched to asyncio + uvloop + concurrent.futures for CPU-bound tasks

Memory fragmentation at high throughput

Impact: Service OOM after 1 hour at 100K RPS

Resolution: Pre-allocated message pools + object reuse

Results

Max throughput per instance
Before5,000 RPS
After20,000 RPS
Improvement4x increase
P99 latency
Before50ms
After8ms
Improvement84% reduction
Server instances needed
Before50
After30
Improvement40% reduction

Lessons Learned

  • 📘 gRPC + Protocol Buffers provided 5x throughput vs JSON over HTTP
  • 📘 asyncio with uvloop achieved Go-like performance in Python
  • 📘 Connection pooling and object reuse critical for high throughput

What We Would Do Differently

  • 💡 Use Cython for performance-critical sections
  • 💡 Implement adaptive load shedding earlier

Role Relevance

Python engineers built a gRPC-based API achieving 120K RPS at 8ms latency—performance previously thought impossible in Python.

Critical Skills Demonstrated

asyncio/uvloop optimizationgRPC and Protocol BuffersHigh-throughput patternsMemory optimization

Related Roles

Frequently Asked Questions

Why not use Go or Rust for this use case?
Team's expertise in Python + gRPC achieved required performance; no rewrite needed.
What was the biggest performance bottleneck?
JSON serialization (3ms). Protocol Buffers reduced it to 0.3ms.