Logo
OFFLINEPIXEL
Digital Media / Stock Photography

Reducing Search Latency in Image Retrieval Systems

A stock photo agency reduced reverse image search latency from 2 seconds to 120ms while scaling to 500M images using FAISS and GPU optimization.

Executive Summary

A stock photo agency with 500M images had reverse image search latency causing poor user retention. FAISS with IVFPQ and GPU inference reduced latency by 94%, enabling real-time similarity search for copyright infringement detection.

Key Outcomes

  • 94% reduction in search latency (2s → 120ms)
  • 98.5% recall at top-100 results
  • Scanned 500M images in under 100ms

Client Situation

Photographers needed to check if their images were used without permission. Current search took 2+ seconds, frustrating users and reducing feature adoption.

Key Challenges

  • 500M images with 2,048-dim ResNet embeddings
  • Linear scan impossible—1TB+ of embeddings
  • Users abandoning search after 5-second timeout

Existing Architecture

Elasticsearch with dense vector plugin using HNSW. Index size 800GB, query latency 2s at P99.

  • HNSW memory overhead unacceptable for 500M vectors
  • Index rebuilds taking 3+ days
  • No GPU support for embedding extraction

Solution Design

FAISS IVFOPQ index with 4-bit PQ compression, GPU-accelerated search, and separate embedding extraction pipeline.

Key Decisions

  • Use IVF65536 with nprobe=64 for recall target
  • PQ64x4 compression reducing vector size from 8KB to 256 bytes
  • Separate embedding service using TensorRT for inference
FAISSResNet50TensorRTCUDAS3Redis

Implementation

Implemented incremental index building with daily updates, replacing Elasticsearch over 3 months.

  1. Phase 1: Phase 1: Embedding Pipeline

    Built TensorRT pipeline for GPU-accelerated ResNet inference at 5ms per image.

  2. Phase 2: Phase 2: FAISS Index

    Trained IVFOPQ index on 500M embeddings over 48 hours on 8x A100 GPUs.

  3. Phase 3: Phase 3: Search Service

    Deployed FAISS as gRPC service with 99.9% availability SLA.

Technical Challenges

Index training time at 500M vectors

Impact: 1-12 hour training window blocking updates

Resolution: Used incremental training with k-means on 1% sample

Recall drop at high compression

Impact: 4-bit PQ reduced recall to 91%

Resolution: Increased nprobe to 96 and added re-ranking with exact distances

Results

Search latency (P99)
Before2,000ms
After120ms
Improvement94% reduction
Index size (500M vectors)
Before800GB
After128GB
Improvement84% reduction
Daily infringement detections
Before500
After5,000
Improvement10x increase

Lessons Learned

  • 📘 4-bit PQ preserves accuracy for visual similarity despite aggressive compression
  • 📘 GPU inference for embeddings was the biggest latency bottleneck
  • 📘 Recall improved 3% by using query expansion with flipped images

What We Would Do Differently

  • 💡 Implement multi-stage search earlier for better accuracy/latency trade-off
  • 💡 Use RAFT for faster IVF training on GPU

Role Relevance

FAISS experts were essential for designing compressed indexes that fit in GPU memory while maintaining recall for visual similarity.

Critical Skills Demonstrated

Product quantization optimizationGPU memory managementHigh-recall search designMulti-stage retrieval

Related Roles

Frequently Asked Questions

How often do you rebuild the index?
Full rebuild weekly, incremental updates daily for new images.
What recall threshold was acceptable for copyright detection?
98.5% recall at top-100—false negatives cost more than false positives.