Notebook-Based Models to Production MLOps
A comprehensive guide to migrating Jupyter notebook ML models to production MLOps pipelines with versioning, testing, and monitoring.
Executive Summary
A data science team had 50+ models running in Jupyter notebooks—no versioning, no testing, manual deployment taking 2 weeks. Over 7 months, they migrated to production MLOps pipelines with MLflow tracking, Kubeflow orchestration, and CI/CD, reducing deployment time to 2 hours and catching 40% of model failures before production. This guide covers notebook refactoring, pipeline automation, and production monitoring.
Why Migrate from Notebook-Based Models
Notebook-based models caused production failures, untracked experiments, and months of manual deployment. The team couldn't reproduce results from 3 months ago.
- → 2-week deployment time per model (20 models/year)
- → 40% of models failed in production (notebook issues)
- → No experiment tracking (duplicate work, 30% waste)
- → Inability to reproduce results (missing dependencies)
MLOps Migration Readiness
The team spent 2 months on preparation: auditing existing notebooks (50 models), selecting MLOps stack (MLflow, Kubeflow, Docker), and training data scientists on software engineering practices.
- • Notebook audit (50 models, 100 notebooks)
- • MLflow tracking server (experiments, models)
- • Kubeflow cluster (training pipelines)
- • Docker registry (model containers)
- • CI/CD (GitHub Actions for tests)
- • Model monitoring (Evidently, WhyLabs)
Notebook-Based Assessment
The team had 50 models across 100 notebooks—no consistent structure, hardcoded paths, and missing dependency lists. The biggest pain points were model retraining (manual, 3 days) and deployment (handoff to engineering, 2 weeks).
Technical Debt
- • Hardcoded paths (models break when files move)
- • No unit tests (0% coverage)
- • Missing dependency lists (environment.yaml)
- • Manual retraining (cron jobs, no monitoring)
Risks
- • Model reproducibility issues (different results each run)
- • Production vs training data mismatch (data drift)
- • Team resistance to software engineering practices
- • Infrastructure cost increase (Kubeflow cluster)
Target MLOps Architecture
The target was automated pipeline: data validation → training → model registry → deployment → monitoring.
7-Month MLOps Migration
Step 1: Phase 1: Foundation (Month 1-2)
Set up MLflow, Kubeflow, Docker registry. Trained 10 data scientists on MLOps practices.
Step 2: Phase 2: Simple Models (Month 3-4)
Migrated 10 simple models to Kubeflow pipelines—proved architecture.
Step 3: Phase 3: Complex Models (Month 5-6)
Migrated 30 complex models (deep learning, ensembles).
Step 4: Phase 4: Production Monitoring (Month 7)
Added Evidently for data drift and performance monitoring.
Data Versioning and Lineage
Notebooks used random data snapshots. MLOps pipeline uses DVC for data versioning and MLflow for dataset tracking.
- • DVC for data versioning (10TB training data)
- • MLflow tracking dataset hash (reproducibility)
- • Data validation (Great Expectations) before training
- • Data drift detection post-deployment
Common Notebook to MLOps Mistakes
Not refactoring notebooks before migration
Impact: MLOps pipeline inherits notebook bugs (30% failure rate)
Prevention: Refactor to Python modules with tests first
Ignoring data versioning
Impact: Models not reproducible (different results each retrain)
Prevention: DVC + MLflow dataset tracking
No model monitoring
Impact: Silent model degradation (weeks before detection)
Prevention: Evidently + alerts on drift
Over-engineering pipeline (too complex)
Impact: 6-month delay, team frustrated
Prevention: Start simple: MLflow + Docker + GitHub Actions
Migration Success Metrics
Who Should Lead MLOps Migration
Recommended Roles
Required Experience
- • 2+ years MLOps (MLflow, Kubeflow)
- • Python software engineering (refactoring, testing)
- • Kubernetes and Docker
- • Data science workflow understanding
Related Roles
Frequently Asked Questions
- Can we keep using notebooks for exploration?
- Yes—notebooks for exploration, but production code must be Python modules. Notebooks can call production modules.
- MLflow vs Weights & Biases?
- MLflow open-source, self-hosted (lower cost). W&B better UI but paid. Choose based on budget.
- How to handle GPU training in pipelines?
- Kubeflow supports GPU nodes. Use Kubeflow Pipelines with GPU node selectors.