Snap (Snapchat)

•

12 min read

•August 12, 2021

Applying GPU to Snap - Snap Engineering

Summary

The article discusses Snap's application of GPU technology to enhance machine learning model inference, emphasizing the importance of deep neural networks (DNN) in delivering personalized content to users. It details the challenges faced in inference workloads and how the integration of NVIDIA T4 GPUs has significantly improved performance metrics such as throughput and latency. The article also highlights engineering solutions developed to optimize GPU utilization, including automated model optimization and custom scheduling for inference workloads, ultimately demonstrating the cost-effectiveness of GPU acceleration in a cloud environment.

Key Learnings

1The integration of NVIDIA T4 GPUs can enhance ML inference performance significantly, achieving up to 15x throughput improvements with low-precision arithmetic.
2Automated model optimization workflows can streamline the process of adapting DNN models for GPU acceleration, ensuring efficient resource utilization.
3Custom scheduling of GPU operations can lead to better throughput and reduced latency by grouping operations from the same model request to the same device.
4Understanding the computational characteristics of different model architectures is crucial for optimizing performance on GPUs, particularly for matrix multiplication-dominated models.
5The cost-effectiveness of GPU VMs compared to CPU VMs can lead to substantial savings while maintaining high throughput in production environments.

Who Should Read This

Senior Machine Learning Engineers implementing GPU acceleration for large-scale inference workloads

Test Your Knowledge

What are the key performance metrics that indicate the effectiveness of GPU acceleration in ML inference workloads?

How does the choice of low-precision arithmetic impact the accuracy and performance of DNN models on GPUs?

What engineering challenges arise when integrating GPU acceleration into existing ML inference stacks, and how can they be addressed?

In what scenarios might CPU operations be preferred over GPU operations in the context of ML inference, and why?

How does the design of a custom GPU operation scheduler improve overall system performance in a cloud-based environment?

Topics

Cuda Deep Learning Machine Learning Tensorrt Tensorflow Xla

Read Full Article at Snap (Snapchat)

More from Snap (Snapchat) Engineering

View Snap (Snapchat) engineering blogs →

Snap (Snapchat)

Spectacles - EyeConnect

The article discusses EyeConnect, a feature designed to facilitate shared augmented reality experiences by allowing users to connect their Spectacles through a novel motion tracking algorithm. Unlike...

Snap (Snapchat)

Universal User Modeling (UUM): A Foundation Model for User Understanding at Snapchat

The article discusses Universal User Modeling (UUM) at Snapchat, a foundational model designed to enhance user understanding across various product surfaces. UUM captures user behaviors over time by...

Snap (Snapchat)

From Monolith to Multicloud Micro-Services: Inside Snap’s Service Mesh - Snap Engineering

The article outlines Snap Engineering's transition from a monolithic application architecture to a microservices architecture deployed across multiple cloud providers, specifically AWS and Google...

Snap (Snapchat)

11m

Don't Rewrite Your App, Unless You Have To - Snap Engineering

The article discusses the Snapchat Engineering team's experience in rewriting their Android app to enhance performance and reduce bugs. It outlines the challenges faced due to the app's complexity...

Snap (Snapchat)

11m

Making The Most of a Rewrite - Snap Engineering

The article outlines the process and considerations involved in rewriting the Snapchat application, focusing on architectural improvements to enhance performance and maintainability. It emphasizes...

Applying GPU to Snap - Snap Engineering

Summary

Key Learnings

Who Should Read This

Test Your Knowledge

Topics

More articles about Cuda

Scaling Small LLMs with NVIDIA MPS

Databricks and NVIDIA: Powering the Next Generation of Industry AI

More from Snap (Snapchat) Engineering

Spectacles - EyeConnect

Universal User Modeling (UUM): A Foundation Model for User Understanding at Snapchat

From Monolith to Multicloud Micro-Services: Inside Snap’s Service Mesh - Snap Engineering

Don't Rewrite Your App, Unless You Have To - Snap Engineering

Making The Most of a Rewrite - Snap Engineering

Related topics