Executive Summary
A marketplace with 10M products and 50M users had recommendation latency causing 15% user drop-off. FAISS-based similarity search replaced their Postgres vector extension, reducing latency by 94% and increasing click-through rates by 28%.
Key Outcomes
- ▹ 94% reduction in recommendation latency
- ▹ 28% improvement in click-through rates
- ▹ 5M additional daily product views
Client Situation
Recommendation engine used pgvector with 768-dim BERT embeddings. At 10M products, similarity search took 800ms, causing abandoned shopping carts.
Key Challenges
- ⚠ Linear scan performance unacceptable at scale
- ⚠ Postgres connection pooling maxed at 5k QPS
- ⚠ Cold start recommendations for new users failing
Existing Architecture
pgvector extension in PostgreSQL with HNSW index. All embeddings stored with product metadata in same database.
- HNSW memory overhead 8x original vector size
- No GPU acceleration support
- Index rebuilds taking 4+ hours
Solution Design
Standalone FAISS service with IVF index, separate from OLTP database. Real-time embedding updates via streaming.
Key Decisions
- ✓ Use IVF4096 with nprobe=20 for 100ms target latency
- ✓ Separate index shards by product category for better recall
- ✓ Warm cache for top 10K most-searched products
Implementation
Shadow mode testing for 2 weeks before full traffic migration, comparing against pgvector results.
Phase 1: Phase 1: Index Build
Trained IVF index on 10M product embeddings, validated recall >98%.
Phase 2: Phase 2: Service Deployment
Deployed FAISS as sidecar container next to recommendation service.
Phase 3: Phase 3: A/B Testing
Gradual rollout from 1% to 100% traffic over 10 days.
Technical Challenges
- Real-time embedding updates
Impact: New products invisible for up to 24 hours
Resolution: Implemented streaming pipeline updating index incrementally every 5 minutes
- Cold start for new users
Impact: Zero recommendations for first session
Resolution: Fallback to category-based popularity until embeddings generated
Results
- Recommendation latency (P99)
- Before800msAfter45msImprovement94% reduction
- Click-through rate
- Before3.2%After4.1%Improvement28% increase
- Max QPS supported
- Before5,000After25,000Improvement5x increase
Lessons Learned
- 📘 FAISS consistently outperforms pgvector for read-heavy recommendation workloads
- 📘 Category-based sharding improved recall by 15% without latency penalty
- 📘 Real-time metrics on search quality are essential for iteration
What We Would Do Differently
- 💡 Implement automated index rebalancing for skewed query distribution
- 💡 Use ONNX quantization for further 40% latency reduction
Role Relevance
FAISS experts understood index parameter tuning, memory-accuracy trade-offs, and production deployment patterns critical for e-commerce scale.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- How did you handle product catalog updates?
- Kafka stream processing updates FAISS index incrementally, with full rebuild nightly for consistency.
- What embedding model was used?
- BERT-based model fine-tuned on user click-through data, producing 384-dim embeddings.