Executive Summary
A stock photo agency with 500M images had reverse image search latency causing poor user retention. FAISS with IVFPQ and GPU inference reduced latency by 94%, enabling real-time similarity search for copyright infringement detection.
Key Outcomes
- ▹ 94% reduction in search latency (2s → 120ms)
- ▹ 98.5% recall at top-100 results
- ▹ Scanned 500M images in under 100ms
Client Situation
Photographers needed to check if their images were used without permission. Current search took 2+ seconds, frustrating users and reducing feature adoption.
Key Challenges
- ⚠ 500M images with 2,048-dim ResNet embeddings
- ⚠ Linear scan impossible—1TB+ of embeddings
- ⚠ Users abandoning search after 5-second timeout
Existing Architecture
Elasticsearch with dense vector plugin using HNSW. Index size 800GB, query latency 2s at P99.
- HNSW memory overhead unacceptable for 500M vectors
- Index rebuilds taking 3+ days
- No GPU support for embedding extraction
Solution Design
FAISS IVFOPQ index with 4-bit PQ compression, GPU-accelerated search, and separate embedding extraction pipeline.
Key Decisions
- ✓ Use IVF65536 with nprobe=64 for recall target
- ✓ PQ64x4 compression reducing vector size from 8KB to 256 bytes
- ✓ Separate embedding service using TensorRT for inference
Implementation
Implemented incremental index building with daily updates, replacing Elasticsearch over 3 months.
Phase 1: Phase 1: Embedding Pipeline
Built TensorRT pipeline for GPU-accelerated ResNet inference at 5ms per image.
Phase 2: Phase 2: FAISS Index
Trained IVFOPQ index on 500M embeddings over 48 hours on 8x A100 GPUs.
Phase 3: Phase 3: Search Service
Deployed FAISS as gRPC service with 99.9% availability SLA.
Technical Challenges
- Index training time at 500M vectors
Impact: 1-12 hour training window blocking updates
Resolution: Used incremental training with k-means on 1% sample
- Recall drop at high compression
Impact: 4-bit PQ reduced recall to 91%
Resolution: Increased nprobe to 96 and added re-ranking with exact distances
Results
- Search latency (P99)
- Before2,000msAfter120msImprovement94% reduction
- Index size (500M vectors)
- Before800GBAfter128GBImprovement84% reduction
- Daily infringement detections
- Before500After5,000Improvement10x increase
Lessons Learned
- 📘 4-bit PQ preserves accuracy for visual similarity despite aggressive compression
- 📘 GPU inference for embeddings was the biggest latency bottleneck
- 📘 Recall improved 3% by using query expansion with flipped images
What We Would Do Differently
- 💡 Implement multi-stage search earlier for better accuracy/latency trade-off
- 💡 Use RAFT for faster IVF training on GPU
Role Relevance
FAISS experts were essential for designing compressed indexes that fit in GPU memory while maintaining recall for visual similarity.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- How often do you rebuild the index?
- Full rebuild weekly, incremental updates daily for new images.
- What recall threshold was acceptable for copyright detection?
- 98.5% recall at top-100—false negatives cost more than false positives.