How did you handle product catalog updates?

Kafka stream processing updates FAISS index incrementally, with full rebuild nightly for consistency.

What embedding model was used?

BERT-based model fine-tuned on user click-through data, producing 384-dim embeddings.

How does this case study work?

Raise a request, talk to experts, fund the project, expert works, review and approve payment. All remote, all through our platform.

Accelerating Product Recommendations with FAISS

Executive Summary

A marketplace with 10M products and 50M users had recommendation latency causing 15% user drop-off. FAISS-based similarity search replaced their Postgres vector extension, reducing latency by 94% and increasing click-through rates by 28%.

Key Outcomes

▹ 94% reduction in recommendation latency
▹ 28% improvement in click-through rates
▹ 5M additional daily product views

Client Situation

Recommendation engine used pgvector with 768-dim BERT embeddings. At 10M products, similarity search took 800ms, causing abandoned shopping carts.

Key Challenges

⚠ Linear scan performance unacceptable at scale
⚠ Postgres connection pooling maxed at 5k QPS
⚠ Cold start recommendations for new users failing

Existing Architecture

pgvector extension in PostgreSQL with HNSW index. All embeddings stored with product metadata in same database.

HNSW memory overhead 8x original vector size
No GPU acceleration support
Index rebuilds taking 4+ hours

Solution Design

Standalone FAISS service with IVF index, separate from OLTP database. Real-time embedding updates via streaming.

Key Decisions

✓ Use IVF4096 with nprobe=20 for 100ms target latency
✓ Separate index shards by product category for better recall
✓ Warm cache for top 10K most-searched products

FAISSRedisKafkagRPCKubernetes

Implementation

Shadow mode testing for 2 weeks before full traffic migration, comparing against pgvector results.

Phase 1: Phase 1: Index Build
Trained IVF index on 10M product embeddings, validated recall >98%.
Phase 2: Phase 2: Service Deployment
Deployed FAISS as sidecar container next to recommendation service.
Phase 3: Phase 3: A/B Testing
Gradual rollout from 1% to 100% traffic over 10 days.

Technical Challenges

Real-time embedding updates

Impact: New products invisible for up to 24 hours

Resolution: Implemented streaming pipeline updating index incrementally every 5 minutes

Cold start for new users

Impact: Zero recommendations for first session

Resolution: Fallback to category-based popularity until embeddings generated

Results

Recommendation latency (P99): Before800ms
After45ms
Improvement94% reduction
Click-through rate: Before3.2%
After4.1%
Improvement28% increase
Max QPS supported: Before5,000
After25,000
Improvement5x increase

Lessons Learned

📘 FAISS consistently outperforms pgvector for read-heavy recommendation workloads
📘 Category-based sharding improved recall by 15% without latency penalty
📘 Real-time metrics on search quality are essential for iteration

What We Would Do Differently

💡 Implement automated index rebalancing for skewed query distribution
💡 Use ONNX quantization for further 40% latency reduction

Role Relevance

FAISS experts understood index parameter tuning, memory-accuracy trade-offs, and production deployment patterns critical for e-commerce scale.

Critical Skills Demonstrated

Index parameter tuningHybrid search designReal-time index updatesShadow mode validation

Frequently Asked Questions

How did you handle product catalog updates?: Kafka stream processing updates FAISS index incrementally, with full rebuild nightly for consistency.
What embedding model was used?: BERT-based model fine-tuned on user click-through data, producing 384-dim embeddings.