Logo
OFFLINEPIXEL
E-commerce / Retail

Accelerating Product Recommendations with FAISS

An e-commerce marketplace reduced recommendation latency from 800ms to 45ms while improving click-through rates by 28% using FAISS-based similarity search.

Executive Summary

A marketplace with 10M products and 50M users had recommendation latency causing 15% user drop-off. FAISS-based similarity search replaced their Postgres vector extension, reducing latency by 94% and increasing click-through rates by 28%.

Key Outcomes

  • 94% reduction in recommendation latency
  • 28% improvement in click-through rates
  • 5M additional daily product views

Client Situation

Recommendation engine used pgvector with 768-dim BERT embeddings. At 10M products, similarity search took 800ms, causing abandoned shopping carts.

Key Challenges

  • Linear scan performance unacceptable at scale
  • Postgres connection pooling maxed at 5k QPS
  • Cold start recommendations for new users failing

Existing Architecture

pgvector extension in PostgreSQL with HNSW index. All embeddings stored with product metadata in same database.

  • HNSW memory overhead 8x original vector size
  • No GPU acceleration support
  • Index rebuilds taking 4+ hours

Solution Design

Standalone FAISS service with IVF index, separate from OLTP database. Real-time embedding updates via streaming.

Key Decisions

  • Use IVF4096 with nprobe=20 for 100ms target latency
  • Separate index shards by product category for better recall
  • Warm cache for top 10K most-searched products
FAISSRedisKafkagRPCKubernetes

Implementation

Shadow mode testing for 2 weeks before full traffic migration, comparing against pgvector results.

  1. Phase 1: Phase 1: Index Build

    Trained IVF index on 10M product embeddings, validated recall >98%.

  2. Phase 2: Phase 2: Service Deployment

    Deployed FAISS as sidecar container next to recommendation service.

  3. Phase 3: Phase 3: A/B Testing

    Gradual rollout from 1% to 100% traffic over 10 days.

Technical Challenges

Real-time embedding updates

Impact: New products invisible for up to 24 hours

Resolution: Implemented streaming pipeline updating index incrementally every 5 minutes

Cold start for new users

Impact: Zero recommendations for first session

Resolution: Fallback to category-based popularity until embeddings generated

Results

Recommendation latency (P99)
Before800ms
After45ms
Improvement94% reduction
Click-through rate
Before3.2%
After4.1%
Improvement28% increase
Max QPS supported
Before5,000
After25,000
Improvement5x increase

Lessons Learned

  • 📘 FAISS consistently outperforms pgvector for read-heavy recommendation workloads
  • 📘 Category-based sharding improved recall by 15% without latency penalty
  • 📘 Real-time metrics on search quality are essential for iteration

What We Would Do Differently

  • 💡 Implement automated index rebalancing for skewed query distribution
  • 💡 Use ONNX quantization for further 40% latency reduction

Role Relevance

FAISS experts understood index parameter tuning, memory-accuracy trade-offs, and production deployment patterns critical for e-commerce scale.

Critical Skills Demonstrated

Index parameter tuningHybrid search designReal-time index updatesShadow mode validation

Related Roles

Frequently Asked Questions

How did you handle product catalog updates?
Kafka stream processing updates FAISS index incrementally, with full rebuild nightly for consistency.
What embedding model was used?
BERT-based model fine-tuned on user click-through data, producing 384-dim embeddings.