Netflix

•

8 min read

•October 17, 2025

How and Why Netflix Built a Real-Time Distributed Graph: Part 1 — Ingesting and Processing Data…

Summary

The article outlines Netflix's journey in developing a Real-Time Distributed Graph (RDG) to enhance data processing and analysis across its diverse services. It highlights the challenges posed by a microservices architecture, where data is siloed, and the need for a unified approach to analyze member interactions in real-time. The RDG architecture is built on a stream processing model using Apache Kafka for event ingestion and Apache Flink for processing, enabling low-latency updates and flexible querying capabilities. The article emphasizes the importance of relationship-centric queries and the ability to adapt to evolving data relationships, which are crucial for delivering personalized user experiences.

Key Learnings

1The transition to a Real-Time Distributed Graph allows for immediate insights into member interactions across multiple devices and services.
2Utilizing Apache Kafka as the ingestion backbone enables durable and replayable streams, essential for real-time data processing.
3Apache Flink's capabilities for near-real-time event processing make it an ideal choice for managing high-throughput data streams.
4The decision to create a 1:1 mapping of Kafka topics to Flink jobs simplifies maintenance and tuning, addressing operational challenges associated with monolithic job structures.
5Graph representations provide significant advantages in flexibility and relationship-centric querying compared to traditional data models.

Who Should Read This

Senior Data Engineers designing scalable real-time data processing systems using Kafka and Flink

Test Your Knowledge

What are the key advantages of using a graph representation for data storage and querying in Netflix's architecture?

How does the choice of Apache Kafka as the ingestion mechanism impact the overall data processing pipeline?

What challenges did Netflix face when initially implementing a monolithic Flink job, and how were these addressed?

In what ways does the RDG architecture facilitate real-time analysis of member interactions across different platforms?

Why is it important to tailor retention policies for Kafka topics based on their throughput and record size?

Topics

Apache Kafka Apache Flink Data Lake Data Quality Data Warehousing

Read Full Article at Netflix

More from Netflix Engineering

View Netflix engineering blogs →

Netflix

10m

ML Observability: Bringing Transparency to Payments and Beyond

The article explores the critical role of ML observability in enhancing the performance and reliability of machine learning models, particularly in payment processing at Netflix. It emphasizes the...

Netflix

From Facts & Metrics to Media Machine Learning: Evolving the Data Engineering Function at Netflix

The article outlines the transformation of data engineering at Netflix, emphasizing the shift from traditional data practices to a new specialization known as Media ML Data Engineering. This...

Netflix

Empowering Netflix Engineers with Incident Management

The article outlines Netflix's journey to democratize incident management, shifting from a centralized model to empowering engineering teams across the organization. It emphasizes the importance of a...

Netflix

10m

Scaling Muse: How Netflix Powers Data-Driven Creative Insights at Trillion-Row Scale

The article discusses Netflix's Muse application, which aims to deliver data-driven insights for content discovery. It highlights the evolution of Muse's architecture from a simple dashboard to a...

Netflix

15m

Building a Resilient Data Platform with Write-Ahead Log at Netflix

The article details Netflix's approach to building a resilient data platform using a Write-Ahead Log (WAL) system to address challenges such as data loss, corruption, and system entropy across...

How and Why Netflix Built a Real-Time Distributed Graph: Part 1 — Ingesting and Processing Data…

Summary

Key Learnings

Who Should Read This

Test Your Knowledge

Topics

More articles about Apache Kafka

Next Generation DB Ingestion at Pinterest

More from Netflix Engineering

ML Observability: Bringing Transparency to Payments and Beyond

From Facts & Metrics to Media Machine Learning: Evolving the Data Engineering Function at Netflix

Empowering Netflix Engineers with Incident Management

Scaling Muse: How Netflix Powers Data-Driven Creative Insights at Trillion-Row Scale

Building a Resilient Data Platform with Write-Ahead Log at Netflix

Related topics