Logo
OFFLINEPIXEL
Legal Technology

Improving Answer Accuracy with RAG Systems

A legal tech company improved RAG answer accuracy from 65% to 94% using advanced retrieval and self-critique techniques.

Executive Summary

A legal tech startup's RAG system answered only 65% of queries correctly—unacceptable for legal professionals. By implementing multi-stage retrieval, self-critique, and citation verification, they achieved 94% accuracy, passing 3 law firm pilot programs.

Key Outcomes

  • 65% → 94% answer accuracy
  • 0% hallucination rate on verified queries
  • 3 enterprise law firm contracts secured

Client Situation

Law firms testing the product found too many incorrect citations and hallucinated case law, making it unusable for client work.

Key Challenges

  • 65% accuracy meant 1 in 3 answers wrong
  • Hallucinated case citations damaging trust
  • Inability to cite specific paragraph numbers

Existing Architecture

Single-stage vector retrieval with naive concatenation, single LLM call for answer generation.

  • No verification of retrieved documents
  • No multi-turn reasoning for complex queries
  • No citation granularity beyond document level

Solution Design

Multi-stage RAG with HyDE retrieval, self-critique verification, and paragraph-level citations.

Key Decisions

  • HyDE (Hypothetical Document Embeddings) for better retrieval
  • Self-critique step verifying answer against retrieved chunks
  • Paragraph-level citations for legal-grade references
LangChainWeaviateGPT-4CohereLegal BERT

Implementation

Iterative improvement with legal experts scoring 1,000 test queries after each change.

  1. Phase 1: Phase 1: Multi-Stage Retrieval

    Added HyDE and cross-encoder re-ranking—improved accuracy to 82%.

  2. Phase 2: Phase 2: Self-Critique

    LLM validates answer against retrieved chunks—reduced hallucinations to near zero.

  3. Phase 3: Phase 3: Citation Granularity

    Added paragraph citations and direct quotes for legal validation.

Technical Challenges

Self-critique latency

Impact: 2x inference time (5 seconds → 10 seconds) unacceptable

Resolution: Parallel verification + smaller critique model for speed

Legal terminology embedding

Impact: Standard embeddings missed legal-specific relationships

Resolution: Fine-tuned Legal BERT on case law corpus

Results

Answer accuracy (legal expert evaluation)
Before65%
After94%
Improvement45% increase
Hallucination rate
Before15%
After0.5%
Improvement97% reduction
Citation precision
BeforeN/A
Afterparagraph-level
Improvementcourt-admissible

Lessons Learned

  • 📘 Self-critique reduced hallucinations from 15% to <1%—critical for legal use
  • 📘 HyDE retrieval improved recall by 25% for complex queries
  • 📘 Legal experts preferred lower accuracy with citations over high accuracy without them

What We Would Do Differently

  • 💡 Implement RAGAS evaluation framework from day one
  • 💡 Use DSPy for automated prompt optimization

Role Relevance

RAG engineers designed the verification pipeline that made legal-grade accuracy possible, transforming a toy demo into enterprise product.

Critical Skills Demonstrated

Multi-stage retrievalSelf-critique pipelinesCitation granularityLegal domain adaptation

Related Roles

Frequently Asked Questions

How do you define answer accuracy for legal queries?
Legal experts scored if answer correctly answered question AND all citations matched the claim.
What was the toughest query type?
Questions requiring reasoning across multiple cases—solved with multi-turn retrieval.