Pinterest
16 min read

Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 2 of 2)

Read Full Article

Summary

The article discusses Pinterest's transition from a Hadoop-based data processing platform to Moka, a next-generation system designed for massive-scale data processing. It highlights the deployment of Moka on AWS Elastic Kubernetes Service (EKS), detailing the use of Terraform for infrastructure management and the implementation of a comprehensive logging and observability framework. The article also covers the challenges and solutions in managing container images and ensuring effective metrics collection and analysis for operational efficiency.

Key Learnings

  • 1Moka's deployment on AWS EKS is structured into multiple environments (test, dev, staging, production) to ensure isolation and security.
  • 2The logging infrastructure leverages Fluent Bit for efficient log management, enabling the aggregation of Spark application logs and system pod logs in Amazon S3.
  • 3Observability is enhanced through a combination of Prometheus and OpenTelemetry, allowing for detailed insights into the performance of EKS clusters.
  • 4The article emphasizes the importance of containerization in Moka, ensuring full isolation and compatibility across different architectures (Intel and ARM).
  • 5The use of Terraform modules facilitates a modular and reusable approach to infrastructure as code, streamlining the deployment process.

Who Should Read This

Senior Data Engineers implementing scalable data processing solutions on cloud platforms like AWS

Test Your Knowledge

?

What are the trade-offs of using AWS EKS for deploying Moka compared to traditional Hadoop clusters?

?

How does Fluent Bit enhance the logging capabilities of Spark applications running on Moka?

?

What design decisions were made to ensure observability in the Moka platform, and what challenges did they address?

?

In what ways does the containerization strategy in Moka differ from the previous Monarch platform, and why is this significant?

?

How does the architecture of Moka support scalability and reliability in data processing workloads?

Topics

Read Full Article at Pinterest