Snap (Snapchat)

•

13 min read

•September 5, 2025

Building a Spark-Powered Platform for ML Data Needs at Snap

Summary

The article outlines the development of 'Prism', a Spark-powered platform designed to meet the unique data processing needs of machine learning (ML) teams at Snap. It highlights the limitations of traditional Spark implementations in handling the iterative and flexible nature of ML workflows, emphasizing the necessity for a tailored data platform that supports rapid experimentation and production stability. The platform aims to abstract away infrastructure complexities, allowing ML engineers to focus on model innovation rather than data processing challenges. Key features of Prism include a user-friendly interface, configurable templates for job authoring, and a robust control plane for managing Spark jobs at scale.

Key Learnings

1Prism provides a unified interface that simplifies the Spark job lifecycle, enhancing usability for ML engineers.
2The platform addresses the iterative nature of ML development by allowing flexible data access and rapid experimentation.
3By centralizing metrics and automating job management, Prism improves reliability and scalability for ML data processing.
4The introduction of configuration-driven templates reduces the learning curve and operational overhead for Spark job authoring.
5Prism integrates with existing tools like Airflow and Kubeflow, ensuring seamless scheduling and monitoring of ML workflows.

Who Should Read This

Senior Data Engineers designing scalable ML data platforms leveraging Apache Spark

Test Your Knowledge

What are the specific challenges that traditional Spark implementations face in ML data processing?

How does Prism's architecture support both pre-production experimentation and post-production stability?

What trade-offs did the team consider when designing the user interface for Prism?

In what ways does the control plane of Prism enhance the reliability and scalability of Spark job management?

How does Prism handle diverse data formats and what impact does this have on ML workflows?

Topics

Apache Spark Data Lake Data Quality Etl Pipelines Data Governance

Read Full Article at Snap (Snapchat)

More from Snap (Snapchat) Engineering

View Snap (Snapchat) engineering blogs →

Snap (Snapchat)

Spectacles - EyeConnect

The article discusses EyeConnect, a feature designed to facilitate shared augmented reality experiences by allowing users to connect their Spectacles through a novel motion tracking algorithm. Unlike...

Snap (Snapchat)

Universal User Modeling (UUM): A Foundation Model for User Understanding at Snapchat

The article discusses Universal User Modeling (UUM) at Snapchat, a foundational model designed to enhance user understanding across various product surfaces. UUM captures user behaviors over time by...

Snap (Snapchat)

From Monolith to Multicloud Micro-Services: Inside Snap’s Service Mesh - Snap Engineering

The article outlines Snap Engineering's transition from a monolithic application architecture to a microservices architecture deployed across multiple cloud providers, specifically AWS and Google...

Snap (Snapchat)

11m

Don't Rewrite Your App, Unless You Have To - Snap Engineering

The article discusses the Snapchat Engineering team's experience in rewriting their Android app to enhance performance and reduce bugs. It outlines the challenges faced due to the app's complexity...

Snap (Snapchat)

11m

Making The Most of a Rewrite - Snap Engineering

The article outlines the process and considerations involved in rewriting the Snapchat application, focusing on architectural improvements to enhance performance and maintainability. It emphasizes...

Building a Spark-Powered Platform for ML Data Needs at Snap

Summary

Key Learnings

Who Should Read This

Test Your Knowledge

Topics

More articles about Apache Spark

Activate first-party data with Meta Conversions API on Databricks

Real-Time Mode: Ultra-low latency streaming on Spark APIs without a second engine

Spark Declarative Pipelines: Why Data Engineering Needs to Become End-to-End Declarative

Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest

Why Apache Spark Real-Time Mode Is A Game Changer for Ad Attribution

More from Snap (Snapchat) Engineering

Spectacles - EyeConnect

Universal User Modeling (UUM): A Foundation Model for User Understanding at Snapchat

From Monolith to Multicloud Micro-Services: Inside Snap’s Service Mesh - Snap Engineering

Don't Rewrite Your App, Unless You Have To - Snap Engineering

Making The Most of a Rewrite - Snap Engineering

Related topics