GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction
Read Full ArticleSummary
The article presents a significant advancement in Pinterest's ads recommendation system through the introduction of a GPU-serving two-tower model for lightweight ranking. This model architecture combines Multi-gate Mixture-of-Experts (MMOE) with Deep & Cross Networks (DCN), optimizing both model performance and serving latency. The transition from CPU to GPU serving has led to a notable reduction in offline loss for click-through rate (CTR) prediction, achieving a 5-10% improvement. The article also highlights various enhancements in training efficiency, including dataloader optimizations and model code efficiency improvements, which collectively contribute to faster training times and better resource utilization. The evaluation results indicate substantial gains in both offline and online metrics, underscoring the effectiveness of the new architecture in scaling Pinterest's recommender systems.
Key Learnings
- 1The transition to GPU-serving for the two-tower model significantly enhances the efficiency of ad engagement prediction by reducing latency while maintaining performance.
- 2Incorporating MMOE with DCN allows for better handling of multi-domain multi-task challenges without relying on domain-specific modules.
- 3Optimizations such as GPU prefetching and BF16 precision training can drastically improve training times and resource utilization.
- 4Segregating ad scenarios during training leads to improved model performance and faster iteration speeds, demonstrating the importance of tailored data handling.
- 5Evaluation metrics like cost-per-click (CPC) and click-through rate (CTR) are critical for assessing the success of the model in real-world applications.
Who Should Read This
Senior Machine Learning Engineers focusing on optimizing ad recommendation systems and improving model serving efficiency.
Test Your Knowledge
What are the trade-offs between using CPU and GPU for serving models in terms of latency and performance?
How does the MMOE architecture improve upon the previous MTMD model in handling multi-domain tasks?
What specific optimizations were implemented to enhance training efficiency, and how do they impact overall model performance?
In what scenarios might the two-tower model fail to perform as expected, and how could these be mitigated?
Why is it important to segment ad scenarios during training, and what effects does this have on model accuracy?
Topics
More articles about Machine Learning
Explore Machine Learning engineering →Decoupled by Design: Billion-Scale Vector Search
The article discusses the challenges and solutions in building a billion-scale vector search system at Databricks. It highlights the limitations of traditional vector databases that couple storage...
Introducing Kasal
Kasal is a low-code platform developed by Databricks Labs for designing, deploying, and orchestrating agentic AI systems. It provides a visual interface that allows users, regardless of their...
Business Intelligence Analytics: A Complete Guide for the AI Era
The article discusses the evolution of business intelligence (BI) analytics, emphasizing the need for organizations to bridge the gap between data collection and actionable insights. It outlines the...
Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals
The article details the development of a Technical Health Score system at Salesforce, aimed at quantifying platform trust through analytics pipelines that handle petabytes of telemetry data. By...
Building What’s Next. Together. Introducing the Brickbuilder Partner Network for the Agentic AI Era
The Brickbuilder Partner Network is a newly established global partner program aimed at fostering growth and innovation among consulting firms, independent software vendors (ISVs), and data providers...
More from Pinterest Engineering
View Pinterest engineering blogs →Unified Context-Intent Embeddings for Scalable Text-to-SQL
The article outlines Pinterest's evolution from basic Text-to-SQL systems to a sophisticated Analytics Agent that leverages unified context-intent embeddings for enhanced query understanding and SQL...
Unifying Ads Engagement Modeling Across Pinterest Surfaces
The article presents a comprehensive approach to unify ads engagement modeling across different surfaces at Pinterest, addressing the challenges posed by previously independent models. It outlines...
Bridging the Gap: Diagnosing Online–Offline Discrepancy in Pinterest’s L1 Conversion Models
The article discusses the challenges faced by Pinterest in reconciling offline and online performance metrics of their L1 conversion models. It highlights the discrepancies observed between strong...
Piqama: Pinterest Quota Management Ecosystem
The article introduces Piqama, Pinterest's comprehensive quota management ecosystem designed to oversee resource quotas across various systems. It outlines the architecture of Piqama, emphasizing its...
Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest
This article details Pinterest's approach to significantly reduce out-of-memory (OOM) errors in their Apache Spark applications through a feature called Auto Memory Retries. By automatically...