Logo
OFFLINEPIXEL
Fintech / Payments

Scaling Data-Intensive Python Backends

A fintech startup scaled Python backend from 100 to 10,000 requests/second using async patterns, connection pooling, and horizontal sharding.

Executive Summary

A fintech startup's Python backend hit capacity at 100 requests per second as user base grew 50x. By migrating from Flask to FastAPI, implementing async database drivers, adding Redis caching, and horizontal sharding, they scaled to 10,000 requests per second while reducing latency by 80%.

Key Outcomes

  • 100 → 10,000 requests per second (100x scale)
  • Latency reduced 400ms → 80ms (80% reduction)
  • Infrastructure cost reduced 60% per request

Client Situation

The startup's payment processing API was timing out during peak hours as transaction volume grew 20% month-over-month. Customers experienced failed payments and checkout abandonment.

Key Challenges

  • Flask synchronous workers blocked on database I/O
  • Single PostgreSQL instance maxed at 2,000 connections
  • No caching layer—repeated queries for same data

Existing Architecture

Flask with Gunicorn workers (sync), SQLAlchemy ORM, PostgreSQL, deployed on 10 EC2 instances.

  • Synchronous I/O blocked request handling
  • ORM overhead adding 50ms per query
  • No read replicas or caching strategy

Solution Design

Rebuilt API layer with FastAPI, asyncpg for database, Redis caching, and connection pooling with PgBouncer.

Key Decisions

  • FastAPI for async request handling (10x concurrency)
  • asyncpg for non-blocking database access
  • Redis caching for idempotency keys (95% cache hit rate)
  • PgBouncer connection pooler (2,000 → 200 DB connections)
PythonFastAPIasyncpgRedisPgBouncerKafka

Implementation

Shadow traffic testing for 4 weeks, comparing new FastAPI endpoints against Flask baseline before cutover.

  1. Phase 1: Phase 1: Read Endpoints

    Migrated GET endpoints first—immediate 5x concurrency improvement.

  2. Phase 2: Phase 2: Write Endpoints

    Added idempotency keys with Redis deduplication.

  3. Phase 3: Phase 3: Database Scaling

    Implemented read replicas and connection pooling for write scalability.

Technical Challenges

Async database transaction handling

Impact: Race conditions in payment idempotency caused double charges

Resolution: Database-level advisory locks + idempotency key TTL

Connection pool exhaustion during spikes

Impact: 200 DB connections still insufficient at 10k RPS

Resolution: Pgbouncer transaction pooling (200 → 1000 effective connections)

Results

Max requests per second
Before100
After10,000
Improvement100x increase
P99 latency
Before400ms
After80ms
Improvement80% reduction
Cost per million requests
Before$45
After$18
Improvement60% reduction

Lessons Learned

  • 📘 Async I/O alone provided 10x concurrency improvement without code changes
  • 📘 Connection pooling was critical—2,000 DB connections impossible, 200 with PgBouncer worked
  • 📘 Idempotency with Redis prevented double charges at scale

What We Would Do Differently

  • 💡 Implement request collapsing for duplicate concurrent calls
  • 💡 Use FastAPI's background tasks for non-critical operations

Role Relevance

Python engineers transformed a synchronous bottleneck into an async powerhouse, scaling 100x while reducing latency and cost.

Critical Skills Demonstrated

Async Python (FastAPI)Database connection poolingCaching strategiesHorizontal scaling patterns

Related Roles

Frequently Asked Questions

Why FastAPI over other async frameworks?
FastAPI's performance matched Node.js benchmarks and provided automatic OpenAPI docs—critical for payment API partners.
How did you validate no data loss during migration?
Shadow traffic ran for 4 weeks comparing responses byte-for-byte before cutover.