Executive Summary
A fintech payment platform's Flask API collapsed at 5,000 concurrent users. Migrating to FastAPI with async patterns and connection pooling scaled to 100,000 concurrent users while reducing P99 latency from 800ms to 150ms.
Key Outcomes
- ▹ 1,000 → 100,000 concurrent users (100x scale)
- ▹ P99 latency: 800ms → 150ms
- ▹ Server count reduced 50%
Client Situation
The platform's user base grew 10x in 6 months, but the API couldn't keep up—users experienced timeouts during peak hours.
Key Challenges
- ⚠ Flask synchronous workers blocked on I/O
- ⚠ Database connection pool exhausted at 5k users
- ⚠ CPU utilization low but response times high
Existing Architecture
Flask with Gunicorn workers (sync), SQLAlchemy ORM, PostgreSQL. Deployed on 20 EC2 instances.
- One request per worker blocked on database calls
- Connection pool (100) insufficient for scale
- Horizontal scaling inefficient (20 instances at 5k users)
Solution Design
FastAPI with async endpoints, asyncpg for database, Redis for caching, and optimized connection pooling.
Key Decisions
- ✓ Async all I/O-bound operations (database, Redis, external APIs)
- ✓ Connection pool size 500 with asyncpg
- ✓ Redis caching for frequently accessed data
Implementation
Endpoints migrated one by one, with A/B testing for each. Rolled out over 4 months.
Phase 1: Phase 1: Read Endpoints
Migrated GET endpoints first—lower risk, immediate benefit.
Phase 2: Phase 2: Write Endpoints
Migrated POST/PUT with transaction handling and idempotency keys.
Phase 3: Phase 3: Optimization
Added Redis caching, response compression, and HTTP/2.
Technical Challenges
- Async transaction management
Impact: Distributed transactions across multiple async calls causing race conditions
Resolution: Used database savepoints and retry logic with exponential backoff
- Connection pool exhaustion under load
Impact: 500 connections still insufficient at 100k users
Resolution: Added PgBouncer connection pooler (5,000 connections → 200 pool)
Results
- Concurrent users supported
- Before5,000After100,000Improvement20x increase
- P99 latency
- Before800msAfter150msImprovement81% reduction
- Server instances
- Before20After10Improvement50% reduction
Lessons Learned
- 📘 Async I/O alone provided 10x concurrency improvement
- 📘 Pydantic v2 validation was 5x faster than v1
- 📘 HTTP/2 multiplexing reduced head-of-line blocking
What We Would Do Differently
- 💡 Add OpenTelemetry tracing from day one
- 💡 Implement request collapsing for duplicate queries
Role Relevance
FastAPI experts understood async patterns, connection pooling, and database optimization to scale 100x without rewriting business logic.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- Why FastAPI over other async frameworks?
- OpenAPI generation, Pydantic validation, and async/await support best-in-class.
- What was the cost savings?
- $100k/month reduced server costs from 20→10 instances.