An analysis of the Square and Cash App outage
Read Full ArticleSummary
The article outlines a service disruption experienced by Square and Cash App on February 26, 2025, caused by a security certificate validation issue that disrupted payment processing. The incident was detected by monitoring systems, prompting immediate investigation and response from engineering teams. A rollback of the problematic certificate was executed, restoring functionality shortly thereafter. The postmortem highlights the importance of automated processes in managing security certificates and outlines future improvements to enhance service resilience and offline payment capabilities.
Key Learnings
- 1Understanding the critical role of automated processes in certificate management to prevent service disruptions.
- 2The significance of real-time monitoring and rapid response in incident management to minimize downtime.
- 3The necessity of having robust offline payment mechanisms to ensure business continuity during outages.
- 4The importance of conducting thorough postmortems to identify root causes and implement preventive measures.
- 5The value of clear communication with stakeholders during incidents to maintain trust and transparency.
Who Should Read This
Senior Site Reliability Engineers analyzing incident response strategies and improving service resilience.
Test Your Knowledge
What are the trade-offs between manual and automated processes in managing security certificates?
How can incident severity escalation impact the response time and effectiveness of the resolution?
What design decisions can be made to enhance the resilience of payment systems against similar outages?
In what ways can offline payment systems be improved to ensure smoother transitions during service disruptions?
Why is it critical to conduct postmortems after incidents, and what key elements should be included in the analysis?
Topics
More articles about Incident Management
Explore Incident Management engineering →Cloudflare outage on February 20, 2026
On February 20, 2026, Cloudflare experienced a significant outage affecting customers using its Bring Your Own IP (BYOIP) service due to a misconfiguration in the Border Gateway Protocol (BGP)...
2025 Q4 DDoS threat report: A record-setting 31.4 Tbps attack caps a year of massive DDoS assaults
The 2025 Q4 DDoS threat report by Cloudflare reveals a significant escalation in DDoS attacks, with a record-setting attack of 31.4 Tbps marking a year of unprecedented assaults. The report...
Route leak incident on January 22, 2026
On January 22, 2026, a misconfiguration in Cloudflare's routing policy led to a significant BGP route leak, affecting both Cloudflare customers and external networks. The incident, which lasted 25...
When protections outlive their purpose: A lesson on managing defense systems at scale
The article outlines the challenges faced by GitHub in managing defense mechanisms that protect the platform from abuse while ensuring legitimate users are not adversely affected. It highlights the...
Securing the Grid: A Practical Guide to Cyber Analytics for Energy & Utilities
The article outlines the critical cybersecurity challenges faced by the Energy & Utilities sector, particularly due to the convergence of IT and operational technology (OT) systems. It emphasizes the...
More from Square Engineering
View Square engineering blogs →A Massively Multi-user Datastore, Synced with Mobile Clients
The article discusses the architectural design of a massively multi-user datastore developed at Square, which is tailored to manage extensive merchant catalogs synced with mobile clients. It...
Command Line Observability with Semantic Exit Codes
The article presents a novel approach to enhancing command line tool observability at Square by introducing semantic exit codes inspired by HTTP status codes. By categorizing exit codes into user...
Celebrating the release of Android Studio Electric Eel
The release of Android Studio Electric Eel introduces a significant performance enhancement through a new parallel project import feature, which reduces average sync times for large codebases by 60%....
Developer Spotlight: Reference Health
The article highlights the journey of Reference Health, a platform that integrates Square's payment solutions into healthcare systems, enabling providers to accept secure payments directly through...
Stampeding Elephants
The article 'Stampeding Elephants' presents a case study from Square's Mobile Developer Experience (MDX) Android team, detailing their journey to modernize the build logic of their Point of Sale...