Executive Summary
A global consulting firm with 50,000 employees spent 4 hours per project on internal research—accessing past deliverables, methodologies, and expertise. Building a production RAG system reduced search time to 30 seconds, increased knowledge reuse by 300%, and generated $50M in annual productivity savings.
Key Outcomes
- ▹ 4 hours → 30 seconds per research query
- ▹ 500,000 documents indexed and searchable
- ▹ $50M annual productivity savings
Client Situation
Consultants wasted hours searching across SharePoint, email, and file shares for prior work. Critical knowledge was siloed across 20+ systems.
Key Challenges
- ⚠ Average 4 hours per project finding relevant past work
- ⚠ 80% of knowledge never reused across teams
- ⚠ New consultants took 6 months to become productive
Existing Architecture
SharePoint search with keyword matching, plus manual knowledge management via Excel trackers.
- Keyword search missed semantic meaning
- No cross-system unified search
- No answer extraction—users still read full documents
Solution Design
Production RAG system with hybrid search, multi-document QA, and citation tracking for source verification.
Key Decisions
- ✓ Hybrid search (vector + keyword) for best recall
- ✓ Document chunking with overlap (256 tokens, 20% overlap)
- ✓ Citation tracking to original sources for trust
Implementation
Pilot with 3 practice areas before firm-wide rollout, iterating on chunking strategy and search quality.
Phase 1: Phase 1: Data Ingestion
Indexed 500K internal documents across SharePoint, email, and file shares.
Phase 2: Phase 2: Search Interface
Built Slack bot and web interface for consultant search.
Phase 3: Phase 3: Production Scaling
Rolled out to 5,000 daily users with 99.9% uptime.
Technical Challenges
- Access control across document sources
Impact: RAG system returning documents user shouldn't see
Resolution: Integrated with Okta for per-document ACL filtering post-retrieval
- Hallucination in critical answers
Impact: Legal team rejecting uncited responses
Resolution: Forced citation inclusion + retrieval confidence scoring
Results
- Time to find relevant document
- Before4 hoursAfter30 secondsImprovement99.8% reduction
- Knowledge reuse rate
- Before20%After80%Improvement4x increase
- New consultant ramp-up time
- Before6 monthsAfter2 monthsImprovement67% reduction
Lessons Learned
- 📘 Hybrid search (vector + keyword) improved recall by 40% vs pure vector
- 📘 Document chunking strategy significantly impacted answer quality—256 tokens with 20% overlap was optimal
- 📘 Consultants trusted RAG more when citations were clearly displayed
What We Would Do Differently
- 💡 Implement incremental indexing for real-time document updates
- 💡 Build feedback collection for continuous fine-tuning
Role Relevance
RAG engineers designed the retrieval pipeline that transformed enterprise knowledge access, balancing recall, latency, and security constraints.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- How do you prevent RAG from returning outdated information?
- Document metadata includes timestamps; users can filter by date or sort by recency.
- What embedding model and chunk size worked best?
- text-embedding-3-large with 256-token chunks and 20% overlap provided best recall/latency trade-off.