Trigger-based retraining pipelines that rebuild models when drift thresholds are crossed, not on a fixed calendar schedule. Pipeline orchestration implemented in Apache Airflow (DAG-based, schedule and event-triggered, retryable per-task), AWS SageMaker Pipelines (managed, no infrastructure to maintain, native integration with SageMaker training jobs and Model Registry), or Prefect (Python-native, excellent for ML teams preferring code-first workflow definition). The choice is made during scoping based on your existing infrastructure and team familiarity, we are not opinionated about orchestrators, only about the outcome.
Pipeline stages: (1) Trigger evaluation, drift threshold crossed, or business metric SLA breach confirmed. (2) Data pull, fresh labelled data extracted from your data warehouse (BigQuery, Redshift, Snowflake) combined with historical training data, versioned using DVC (Data Version Control) so the exact dataset that produced any given model can be reconstructed. (3) Training, executed in a reproducible Docker container with pinned dependency versions; MLflow Projects or SageMaker Training Jobs log every run with parameters, metrics, and environment hash. (4) Validation, retrained model evaluated against a held-out test set for accuracy metrics, against a suite of business-logic assertions (e.g., "high-risk customer segments must be flagged at recall >= 0.90"), and against an adversarial edge-case set for the failure modes your team knows matter. (5) Promotion, model that passes all validation gates is registered in the Model Registry at candidate status, promoted to shadow mode (receives production traffic but predictions are not served), then promoted to production after shadow-mode accuracy confirms real-world performance matches offline validation. Failed validation triggers a Slack or PagerDuty alert to the model owner with the specific assertion that failed and the metric delta. The previous champion version remains in production and is retained for immediate rollback.