SalesforceMigration at Scale: Moving Marketing Cloud Caching from Memcached to Redis at 1.5M RPS Without Downtime
Read Full ArticleSummary
The article discusses a complex migration of the Marketing Cloud Caching infrastructure from Memcached to Redis, achieving a zero-downtime transition while handling approximately 1.5 million cache events per second. It details the challenges faced, including the need for high availability, security, and performance, alongside the implementation of a Dynamic Cache Router to manage traffic between the two caching systems. The migration involved meticulous planning to ensure behavioral parity and operational control, with a focus on maintaining low latency and stable performance metrics throughout the process. The article also highlights the proactive measures taken to identify and mitigate hot-key issues, ensuring the reliability of the Redis Cluster under production conditions.
Key Learnings
- 1Achieving zero-downtime migration requires careful planning and the implementation of a Dynamic Cache Router to manage traffic between legacy and new systems.
- 2Maintaining behavioral parity during migration is critical to avoid application disruptions, necessitating a compatibility layer for TTL and key handling.
- 3Proactive hot-key detection and mitigation strategies are essential for maintaining performance and stability in high-throughput caching scenarios.
- 4The use of real production traffic for validation is crucial to accurately assess performance metrics and ensure that the new system meets operational expectations.
- 5Segregating read and write traffic can significantly enhance overall cluster stability and performance, especially under high load conditions.
Who Should Read This
Senior Data Engineers managing high-throughput caching solutions in production environments
Test Your Knowledge
What are the implications of using Memcached's lack of replication compared to Redis's intrinsic primary-replica replication during a migration?
How does the Dynamic Cache Router facilitate a seamless transition between caching systems without requiring application code changes?
What specific challenges arise from maintaining Time-to-Live (TTL) semantics when migrating from Memcached to Redis?
In what ways can proactive hot-key detection improve the performance of a Redis Cluster under high request rates?
Why is it important to validate performance metrics using real production traffic rather than synthetic benchmarks during a migration?
Topics
More articles about Data Quality
Explore Data Quality engineering →Transforming Healthcare Referrals with Fivetran, Agentic AI, and Databricks Genie
The article outlines how healthcare organizations can address fragmented data challenges by leveraging Fivetran for seamless data extraction and Databricks for data unification and AI deployment. It...
The Professional Impact of Becoming Databricks Certified
The article highlights the significance of Databricks certifications in enhancing professional credibility and career opportunities for data and AI practitioners. It emphasizes that these...
Business Intelligence Analytics: A Complete Guide for the AI Era
The article discusses the evolution of business intelligence (BI) analytics, emphasizing the need for organizations to bridge the gap between data collection and actionable insights. It outlines the...
Building a near real-time application with Zerobus Ingest and Lakebase
The article discusses the integration of Zerobus Ingest and Lakebase within the Databricks platform to facilitate the development of near real-time applications. It highlights how Zerobus Ingest...
New in Migrations: Faster and More Predictable
The article outlines the latest enhancements in Lakebridge, a tool designed to streamline the migration of legacy data warehouses to the Databricks platform. Key features include an automated...
More from Salesforce Engineering
View Salesforce engineering blogs →Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals
The article details the development of a Technical Health Score system at Salesforce, aimed at quantifying platform trust through analytics pipelines that handle petabytes of telemetry data. By...
How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings
The article discusses how the Data 360 Compute Fabric team at Salesforce optimized Kubernetes scheduling to enhance resource efficiency and reduce costs. By evolving the default kube-scheduler...
Delivering Accurate, Low-Latency Voice-to-Form AI in Real-World Field Conditions
The article explores the development of a hybrid architecture for a voice-to-form AI system used in field service applications. It highlights the integration of on-device speech-to-text capabilities...
Hyperforce Migration at Scale: How Deterministic Automation Replaced Manual Spreadsheets Across 95,000 Organizations
The article outlines the development of the Migration Intake and Processing Service (MIPS) at Salesforce, which automates the migration of over 95,000 organizations to Hyperforce. It highlights the...
Building an AI-Accelerated Compliance Automation Platform for 24x Faster Audits
The article outlines the development of FastTrack, a compliance automation platform by Salesforce, which significantly reduces audit execution time through AI-assisted development and API-based...