Salesforce
7 min read

Inside Salesforce Edge: Automating Global Rollback for 1.5 Trillion Requests in 10 Minutes

Read Full Article

Summary

The article discusses the innovative architecture of Salesforce Edge, focusing on how the team reduced global rollback time from hours to minutes by implementing a blue-green deployment strategy. By re-architecting Kubernetes deployments and automating traffic cutover, Salesforce Edge ensures high availability and security for its global services. The article highlights the challenges faced with traditional rollback methods and the solutions devised to maintain operational efficiency and minimize downtime during critical incidents.

Key Learnings

  • 1Implementing a blue-green deployment model allows for rapid traffic redirection without the need for extensive downtime or resource rebuilding.
  • 2Automated traffic cutover and TCP connection draining are essential for maintaining service availability during rollback events.
  • 3The design of deployment pipelines must account for the unique requirements of global services to ensure consistent performance and reliability.
  • 4Maintaining identical scaling for blue and green deployments is critical to avoid capacity issues during traffic transitions.
  • 5Utilizing existing Kubernetes constructs for deployment automation can provide better control over complex deployment scenarios.

Who Should Read This

Senior Site Reliability Engineers designing high-availability systems with complex rollback requirements

Test Your Knowledge

?

What are the trade-offs of using a blue-green deployment model in a global service architecture?

?

How does the implementation of automated rollback mechanisms impact overall system reliability?

?

What design decisions were made to ensure that both blue and green deployments could handle full load during traffic cutover?

?

In what scenarios might the automated connection draining approach fail, and how can those be mitigated?

?

Why was Argo deemed unsuitable for Salesforce Edge's requirements, and what alternative solutions were implemented?

Topics

Read Full Article at Salesforce

More from Salesforce Engineering

View Salesforce engineering blogs →
Salesforce
6m

Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals

The article details the development of a Technical Health Score system at Salesforce, aimed at quantifying platform trust through analytics pipelines that handle petabytes of telemetry data. By...

Salesforce
5m

How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings

The article discusses how the Data 360 Compute Fabric team at Salesforce optimized Kubernetes scheduling to enhance resource efficiency and reduce costs. By evolving the default kube-scheduler...

Salesforce
6m

Delivering Accurate, Low-Latency Voice-to-Form AI in Real-World Field Conditions

The article explores the development of a hybrid architecture for a voice-to-form AI system used in field service applications. It highlights the integration of on-device speech-to-text capabilities...

Salesforce
7m

Hyperforce Migration at Scale: How Deterministic Automation Replaced Manual Spreadsheets Across 95,000 Organizations

The article outlines the development of the Migration Intake and Processing Service (MIPS) at Salesforce, which automates the migration of over 95,000 organizations to Hyperforce. It highlights the...

Salesforce
5m

Building an AI-Accelerated Compliance Automation Platform for 24x Faster Audits

The article outlines the development of FastTrack, a compliance automation platform by Salesforce, which significantly reduces audit execution time through AI-assisted development and API-based...