What is a purged time series split?

Removes data between train and validation sets to prevent information leakage.

How do you measure overfitting?

Performance decay between in-sample CV and out-of-sample walk-forward.

How does this case study work?

Raise a request, talk to experts, fund the project, expert works, review and approve payment. All remote, all through our platform.

Reducing Overfitting in Algorithmic Trading Models

Executive Summary

A systematic fund's ML models performed well in backtests but decayed 50% out-of-sample due to overfitting. Implementing nested cross-validation, regularization, and purged time series splits reduced overfitting from 50% to 8%, saving $15M in potential losses.

Key Outcomes

▹ Overfitting reduced 50% → 8% decay
▹ Model feature count reduced 150 → 25 (83% reduction)
▹ $15M saved in avoided strategy failures

Client Situation

The fund's ML team built complex models with 150+ features that looked great in-sample but failed live—classic overfitting.

Key Challenges

⚠ 50% performance decay in live trading vs backtest
⚠ Feature engineering causing look-ahead bias
⚠ No rigorous out-of-sample validation framework

Existing Architecture

Random train/test split, no cross-validation, manual feature selection, no regularization.

In-sample Sharpe 2.5 → live Sharpe 1.2 (52% decay)
Model retrained rarely (quarterly)
No testing for feature stability

Solution Design

Purged time series cross-validation, feature selection with L1 regularization, and walk-forward testing.

Key Decisions

✓ Nested cross-validation (5x5) for hyperparameter tuning
✓ Purged splits to prevent future data leakage
✓ Regularization (L1) reducing feature count 83%

PythonScikit-learnXGBoostOptunaBacktrader

Implementation

Validated on historical data first, then paper traded for 3 months before live deployment.

Phase 1: Phase 1: Validation Framework
Built purged time series CV (200 splits, 6 years of data).
Phase 2: Phase 2: Feature Reduction
L1 regularization reduced 150 features to 25, improved stability.
Phase 3: Phase 3: Live Deployment
Deployed 12 robust models with monthly retraining.

Technical Challenges

Time series leakage in cross-validation

Impact: Future data leaking into training folds

Resolution: Purged splits with gap between train and validation (20 periods)

Hyperparameter explosion

Impact: 5x5 nested CV = 25 parameter sets × 20 models = 500 training runs

Resolution: Bayesian optimization (Optuna) reduced iterations 90%

Results

Live vs backtest Sharpe decay: Before52%
After8%
Improvement84% reduction
Model features: Before150
After25
Improvement83% reduction
Monthly retraining time: Before8 hours
After45 minutes
Improvement91% reduction

Lessons Learned

📘 Purged cross-validation essential for preventing look-ahead bias
📘 Regularization reduced overfitting more than more data
📘 Fewer, more stable features outperformed complex models live

What We Would Do Differently

💡 Implement Shapley values for feature interpretability earlier
💡 Use model stacking for diversification

Role Relevance

Validation experts overhauled the model development process, reducing overfitting from 50% to 8% and saving $15M in strategy failures.

Critical Skills Demonstrated

Time series cross-validationRegularization techniquesFeature selectionModel validation frameworks

Frequently Asked Questions

What is a purged time series split?: Removes data between train and validation sets to prevent information leakage.
How do you measure overfitting?: Performance decay between in-sample CV and out-of-sample walk-forward.