SalesforceInside Salesforce Edge: Automating Global Rollback for 1.5 Trillion Requests in 10 Minutes
Read Full ArticleSummary
The article discusses the innovative architecture of Salesforce Edge, focusing on how the team reduced global rollback time from hours to minutes by implementing a blue-green deployment strategy. By re-architecting Kubernetes deployments and automating traffic cutover, Salesforce Edge ensures high availability and security for its global services. The article highlights the challenges faced with traditional rollback methods and the solutions devised to maintain operational efficiency and minimize downtime during critical incidents.
Key Learnings
- 1Implementing a blue-green deployment model allows for rapid traffic redirection without the need for extensive downtime or resource rebuilding.
- 2Automated traffic cutover and TCP connection draining are essential for maintaining service availability during rollback events.
- 3The design of deployment pipelines must account for the unique requirements of global services to ensure consistent performance and reliability.
- 4Maintaining identical scaling for blue and green deployments is critical to avoid capacity issues during traffic transitions.
- 5Utilizing existing Kubernetes constructs for deployment automation can provide better control over complex deployment scenarios.
Who Should Read This
Senior Site Reliability Engineers designing high-availability systems with complex rollback requirements
Test Your Knowledge
What are the trade-offs of using a blue-green deployment model in a global service architecture?
How does the implementation of automated rollback mechanisms impact overall system reliability?
What design decisions were made to ensure that both blue and green deployments could handle full load during traffic cutover?
In what scenarios might the automated connection draining approach fail, and how can those be mitigated?
Why was Argo deemed unsuitable for Salesforce Edge's requirements, and what alternative solutions were implemented?
Topics
More articles about Microservices
Explore Microservices engineering →You can't stream the energy: A developer's guide to Google Cloud Next '26 in Vegas
The article serves as a guide for developers attending Google Cloud Next '26 in Las Vegas, highlighting the importance of in-person collaboration and the value of hands-on learning. It outlines key...
Hyperforce Migration at Scale: How Deterministic Automation Replaced Manual Spreadsheets Across 95,000 Organizations
The article outlines the development of the Migration Intake and Processing Service (MIPS) at Salesforce, which automates the migration of over 95,000 organizations to Hyperforce. It highlights the...
Safeguarding Dynamic Configuration Changes at Scale
The article outlines Airbnb's dynamic configuration platform, Sitar, which enables safe and reliable runtime behavior changes without service interruptions. It emphasizes the importance of a coherent...
My Journey to Airbnb — Anna Sulkina
Anna Sulkina's journey to Airbnb highlights her extensive experience in engineering, particularly in application and cloud infrastructure. She transitioned from hardware diagnostics to software...
The Container paradox: Why the Inference Cloud Demands a “Decoupled” Database
The article explores the challenges of managing databases within Kubernetes clusters, particularly in the context of the Inference Cloud, where AI-driven applications require efficient data access...
More from Salesforce Engineering
View Salesforce engineering blogs →Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals
The article details the development of a Technical Health Score system at Salesforce, aimed at quantifying platform trust through analytics pipelines that handle petabytes of telemetry data. By...
How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings
The article discusses how the Data 360 Compute Fabric team at Salesforce optimized Kubernetes scheduling to enhance resource efficiency and reduce costs. By evolving the default kube-scheduler...
Delivering Accurate, Low-Latency Voice-to-Form AI in Real-World Field Conditions
The article explores the development of a hybrid architecture for a voice-to-form AI system used in field service applications. It highlights the integration of on-device speech-to-text capabilities...
Hyperforce Migration at Scale: How Deterministic Automation Replaced Manual Spreadsheets Across 95,000 Organizations
The article outlines the development of the Migration Intake and Processing Service (MIPS) at Salesforce, which automates the migration of over 95,000 organizations to Hyperforce. It highlights the...
Building an AI-Accelerated Compliance Automation Platform for 24x Faster Audits
The article outlines the development of FastTrack, a compliance automation platform by Salesforce, which significantly reduces audit execution time through AI-assisted development and API-based...