Logo
OFFLINEPIXEL
Fintech

Scaling FastAPI for High-Concurrency APIs

A fintech platform scaled their API from 1,000 to 100,000 concurrent users using FastAPI, async patterns, and database connection pooling.

Executive Summary

A fintech payment platform's Flask API collapsed at 5,000 concurrent users. Migrating to FastAPI with async patterns and connection pooling scaled to 100,000 concurrent users while reducing P99 latency from 800ms to 150ms.

Key Outcomes

  • 1,000 → 100,000 concurrent users (100x scale)
  • P99 latency: 800ms → 150ms
  • Server count reduced 50%

Client Situation

The platform's user base grew 10x in 6 months, but the API couldn't keep up—users experienced timeouts during peak hours.

Key Challenges

  • Flask synchronous workers blocked on I/O
  • Database connection pool exhausted at 5k users
  • CPU utilization low but response times high

Existing Architecture

Flask with Gunicorn workers (sync), SQLAlchemy ORM, PostgreSQL. Deployed on 20 EC2 instances.

  • One request per worker blocked on database calls
  • Connection pool (100) insufficient for scale
  • Horizontal scaling inefficient (20 instances at 5k users)

Solution Design

FastAPI with async endpoints, asyncpg for database, Redis for caching, and optimized connection pooling.

Key Decisions

  • Async all I/O-bound operations (database, Redis, external APIs)
  • Connection pool size 500 with asyncpg
  • Redis caching for frequently accessed data
FastAPIasyncpgRedisKubernetesPrometheus

Implementation

Endpoints migrated one by one, with A/B testing for each. Rolled out over 4 months.

  1. Phase 1: Phase 1: Read Endpoints

    Migrated GET endpoints first—lower risk, immediate benefit.

  2. Phase 2: Phase 2: Write Endpoints

    Migrated POST/PUT with transaction handling and idempotency keys.

  3. Phase 3: Phase 3: Optimization

    Added Redis caching, response compression, and HTTP/2.

Technical Challenges

Async transaction management

Impact: Distributed transactions across multiple async calls causing race conditions

Resolution: Used database savepoints and retry logic with exponential backoff

Connection pool exhaustion under load

Impact: 500 connections still insufficient at 100k users

Resolution: Added PgBouncer connection pooler (5,000 connections → 200 pool)

Results

Concurrent users supported
Before5,000
After100,000
Improvement20x increase
P99 latency
Before800ms
After150ms
Improvement81% reduction
Server instances
Before20
After10
Improvement50% reduction

Lessons Learned

  • 📘 Async I/O alone provided 10x concurrency improvement
  • 📘 Pydantic v2 validation was 5x faster than v1
  • 📘 HTTP/2 multiplexing reduced head-of-line blocking

What We Would Do Differently

  • 💡 Add OpenTelemetry tracing from day one
  • 💡 Implement request collapsing for duplicate queries

Role Relevance

FastAPI experts understood async patterns, connection pooling, and database optimization to scale 100x without rewriting business logic.

Critical Skills Demonstrated

Async Python (FastAPI)Database connection poolingHigh-concurrency patternsPerformance optimization

Related Roles

Frequently Asked Questions

Why FastAPI over other async frameworks?
OpenAPI generation, Pydantic validation, and async/await support best-in-class.
What was the cost savings?
$100k/month reduced server costs from 20→10 instances.