Bridging the Gap: Diagnosing Online–Offline Discrepancy in Pinterest’s L1 Conversion Models
Read Full ArticleSummary
The article discusses the challenges faced by Pinterest in reconciling offline and online performance metrics of their L1 conversion models. It highlights the discrepancies observed between strong offline evaluations and disappointing online A/B test results. The authors structured their investigation into three layers: model evaluation, serving features, and funnel design, ultimately identifying issues with feature coverage and embedding version skew as significant contributors to the observed discrepancies. The article emphasizes the importance of aligning training and serving environments and understanding the limitations of offline metrics in predicting online performance.
Key Learnings
- 1Offline evaluation metrics must be trusted only after confirming alignment with serving conditions.
- 2Feature coverage discrepancies can lead to significant performance gaps between offline training and online execution.
- 3Embedding version skew can degrade model performance, necessitating careful management of model versions in production.
- 4Funnel alignment is crucial; even with improved model metrics, overall system performance may not improve if the funnel is already saturated.
- 5Understanding the differences between offline and online metrics is essential for accurate performance assessment.
Who Should Read This
Senior Machine Learning Engineers analyzing model performance discrepancies in production environments
Test Your Knowledge
What are the implications of feature coverage discrepancies on model performance in production?
How does embedding version skew affect the reliability of predictions in a two-tower architecture?
Why is funnel alignment critical in ensuring that improved L1 model metrics translate into better CPA outcomes?
What strategies can be employed to mitigate the effects of exposure bias in A/B testing?
How can offline metrics be interpreted correctly in the context of real-world auction behavior?
Topics
More articles about Machine Learning
Explore Machine Learning engineering →Decoupled by Design: Billion-Scale Vector Search
The article discusses the challenges and solutions in building a billion-scale vector search system at Databricks. It highlights the limitations of traditional vector databases that couple storage...
Introducing Kasal
Kasal is a low-code platform developed by Databricks Labs for designing, deploying, and orchestrating agentic AI systems. It provides a visual interface that allows users, regardless of their...
Business Intelligence Analytics: A Complete Guide for the AI Era
The article discusses the evolution of business intelligence (BI) analytics, emphasizing the need for organizations to bridge the gap between data collection and actionable insights. It outlines the...
Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals
The article details the development of a Technical Health Score system at Salesforce, aimed at quantifying platform trust through analytics pipelines that handle petabytes of telemetry data. By...
Building What’s Next. Together. Introducing the Brickbuilder Partner Network for the Agentic AI Era
The Brickbuilder Partner Network is a newly established global partner program aimed at fostering growth and innovation among consulting firms, independent software vendors (ISVs), and data providers...
More from Pinterest Engineering
View Pinterest engineering blogs →Unified Context-Intent Embeddings for Scalable Text-to-SQL
The article outlines Pinterest's evolution from basic Text-to-SQL systems to a sophisticated Analytics Agent that leverages unified context-intent embeddings for enhanced query understanding and SQL...
Unifying Ads Engagement Modeling Across Pinterest Surfaces
The article presents a comprehensive approach to unify ads engagement modeling across different surfaces at Pinterest, addressing the challenges posed by previously independent models. It outlines...
Piqama: Pinterest Quota Management Ecosystem
The article introduces Piqama, Pinterest's comprehensive quota management ecosystem designed to oversee resource quotas across various systems. It outlines the architecture of Piqama, emphasizing its...
Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest
This article details Pinterest's approach to significantly reduce out-of-memory (OOM) errors in their Apache Spark applications through a feature called Auto Memory Retries. By automatically...
GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction
The article presents a significant advancement in Pinterest's ads recommendation system through the introduction of a GPU-serving two-tower model for lightweight ranking. This model architecture...