Executive Summary
A fintech startup's Python backend hit capacity at 100 requests per second as user base grew 50x. By migrating from Flask to FastAPI, implementing async database drivers, adding Redis caching, and horizontal sharding, they scaled to 10,000 requests per second while reducing latency by 80%.
Key Outcomes
- ▹ 100 → 10,000 requests per second (100x scale)
- ▹ Latency reduced 400ms → 80ms (80% reduction)
- ▹ Infrastructure cost reduced 60% per request
Client Situation
The startup's payment processing API was timing out during peak hours as transaction volume grew 20% month-over-month. Customers experienced failed payments and checkout abandonment.
Key Challenges
- ⚠ Flask synchronous workers blocked on database I/O
- ⚠ Single PostgreSQL instance maxed at 2,000 connections
- ⚠ No caching layer—repeated queries for same data
Existing Architecture
Flask with Gunicorn workers (sync), SQLAlchemy ORM, PostgreSQL, deployed on 10 EC2 instances.
- Synchronous I/O blocked request handling
- ORM overhead adding 50ms per query
- No read replicas or caching strategy
Solution Design
Rebuilt API layer with FastAPI, asyncpg for database, Redis caching, and connection pooling with PgBouncer.
Key Decisions
- ✓ FastAPI for async request handling (10x concurrency)
- ✓ asyncpg for non-blocking database access
- ✓ Redis caching for idempotency keys (95% cache hit rate)
- ✓ PgBouncer connection pooler (2,000 → 200 DB connections)
Implementation
Shadow traffic testing for 4 weeks, comparing new FastAPI endpoints against Flask baseline before cutover.
Phase 1: Phase 1: Read Endpoints
Migrated GET endpoints first—immediate 5x concurrency improvement.
Phase 2: Phase 2: Write Endpoints
Added idempotency keys with Redis deduplication.
Phase 3: Phase 3: Database Scaling
Implemented read replicas and connection pooling for write scalability.
Technical Challenges
- Async database transaction handling
Impact: Race conditions in payment idempotency caused double charges
Resolution: Database-level advisory locks + idempotency key TTL
- Connection pool exhaustion during spikes
Impact: 200 DB connections still insufficient at 10k RPS
Resolution: Pgbouncer transaction pooling (200 → 1000 effective connections)
Results
- Max requests per second
- Before100After10,000Improvement100x increase
- P99 latency
- Before400msAfter80msImprovement80% reduction
- Cost per million requests
- Before$45After$18Improvement60% reduction
Lessons Learned
- 📘 Async I/O alone provided 10x concurrency improvement without code changes
- 📘 Connection pooling was critical—2,000 DB connections impossible, 200 with PgBouncer worked
- 📘 Idempotency with Redis prevented double charges at scale
What We Would Do Differently
- 💡 Implement request collapsing for duplicate concurrent calls
- 💡 Use FastAPI's background tasks for non-critical operations
Role Relevance
Python engineers transformed a synchronous bottleneck into an async powerhouse, scaling 100x while reducing latency and cost.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- Why FastAPI over other async frameworks?
- FastAPI's performance matched Node.js benchmarks and provided automatic OpenAPI docs—critical for payment API partners.
- How did you validate no data loss during migration?
- Shadow traffic ran for 4 weeks comparing responses byte-for-byte before cutover.