Executive Summary
A quant fund's traditional factor models showed decaying alpha. By integrating satellite parking lot imagery and credit card transactions, they discovered novel signals predicting retail earnings surprises, generating 12% annual alpha with 0.3 correlation to existing factors.
Key Outcomes
- ▹ 12% annual alpha from alternative data signals
- ▹ 0.3 correlation to existing factors (highly diversifying)
- ▹ 3 new systematic strategies deployed
Client Situation
The fund's existing value/momentum factors had Sharpe ratios declining from 2.5 to 1.2 over 3 years. They needed novel data sources for differentiation.
Key Challenges
- ⚠ Traditional factors crowded and decaying
- ⚠ Alternative datasets prohibitively expensive to evaluate
- ⚠ No infrastructure for processing non-standard data
Existing Architecture
Factor models using price/volume data from Bloomberg. Research in Python notebooks, production in C++. No alternative data pipeline.
- No capability to process image or unstructured text data
- Manual data vendor integration taking 3+ months
- Research-to-production handoff lacking for new data types
Solution Design
Alternative data platform with standardized ingestion, feature extraction, and backtesting for satellite, transaction, and web-scraped datasets.
Key Decisions
- ✓ Satellite imagery processing with computer vision for parking lot occupancy
- ✓ Credit card transaction aggregation by merchant category
- ✓ Unified feature store for all alternative signals
Implementation
3-month pilot with single data vendor before expanding to 5 sources over 12 months.
Phase 1: Phase 1: Satellite Pilot
Built pipeline for parking lot imagery, discovered strong signal for retail earnings.
Phase 2: Phase 2: Transaction Data
Aggregated anonymized card data at merchant-day level, found consumer spending signal.
Phase 3: Phase 3: Production Integration
Combined signals into multi-strategy portfolio with 1.8 Sharpe ratio.
Technical Challenges
- Satellite image processing at scale
Impact: 1TB daily imagery required 200+ GPU hours
Resolution: Pre-computed parking lot features, on-demand inference for new images
- Transaction data aggregation lag
Impact: 3-day delay made signals too stale
Resolution: Switched to same-day processing with Spark streaming
Results
- Annual alpha (gross)
- Before0%After12%ImprovementNew alpha source
- Correlation to existing factors
- BeforeN/AAfter0.31ImprovementHighly diversifying
- Time to onboard new data vendor
- Before3 monthsAfter2 weeksImprovement86% reduction
Lessons Learned
- 📘 Parking lot occupancy predicted retail earnings surprises with 3-week lead
- 📘 Credit card data required careful cleaning for returns/refunds
- 📘 Alternative data alpha decayed slower than traditional factors (9 months vs 3)
What We Would Do Differently
- 💡 Build synthetic data for backtesting new signals
- 💡 Implement automated data quality monitoring earlier
Role Relevance
Quant researchers with alternative data expertise identified signals traditional quants missed, discovering 12% uncorrelated alpha.
Critical Skills Demonstrated
Related Roles
Frequently Asked Questions
- Which data vendors provided the most alpha?
- Satellite parking lot data (6% alpha) and credit card transactions (4% alpha) were top performers.
- How much did the data cost?
- $2M annually across 5 vendors, generating $50M+ PnL—25x ROI.