Removing dependency tangles in the Atlassian Platform for increased reliability and recoverability
Read Full ArticleSummary
The article outlines Atlassian's Continuous PaaS Recovery (CPR) program, which aims to enhance platform reliability and recoverability by addressing complex service dependencies. It details the identification and elimination of circular dependencies and other architectural tangles that impede recovery efforts. The CPR initiative involved surveying service owners, categorizing dependencies, and implementing a layered architecture to minimize hard dependencies. By rearchitecting the platform and fostering a culture of dependency awareness, Atlassian has significantly improved its cloud resilience and operational practices.
Key Learnings
- 1Understanding the impact of circular dependencies on platform reliability and the necessity of addressing them for effective recovery.
- 2The importance of categorizing dependencies into hard and soft types to prioritize risk reduction efforts.
- 3Implementing a layered architecture to isolate dependencies and improve recoverability across services.
- 4The role of education and cultural shifts in minimizing future dependency tangles within engineering teams.
Who Should Read This
Senior Platform Engineers focusing on enhancing service reliability and recoverability in cloud architectures.
Test Your Knowledge
What are the trade-offs involved in prioritizing hard dependencies over soft dependencies in a large-scale platform?
How does the layered architecture approach mitigate the risks associated with circular dependencies?
What specific strategies were employed to educate engineers about dependency risks and foster a culture of awareness?
In what scenarios might 'break glass' solutions be necessary, and how can they be integrated into normal operations?
What metrics or indicators can be used to assess the effectiveness of the CPR program in improving platform reliability?
Topics
More articles about Microservices
Explore Microservices engineering →You can't stream the energy: A developer's guide to Google Cloud Next '26 in Vegas
The article serves as a guide for developers attending Google Cloud Next '26 in Las Vegas, highlighting the importance of in-person collaboration and the value of hands-on learning. It outlines key...
Hyperforce Migration at Scale: How Deterministic Automation Replaced Manual Spreadsheets Across 95,000 Organizations
The article outlines the development of the Migration Intake and Processing Service (MIPS) at Salesforce, which automates the migration of over 95,000 organizations to Hyperforce. It highlights the...
Safeguarding Dynamic Configuration Changes at Scale
The article outlines Airbnb's dynamic configuration platform, Sitar, which enables safe and reliable runtime behavior changes without service interruptions. It emphasizes the importance of a coherent...
My Journey to Airbnb — Anna Sulkina
Anna Sulkina's journey to Airbnb highlights her extensive experience in engineering, particularly in application and cloud infrastructure. She transitioned from hardware diagnostics to software...
The Container paradox: Why the Inference Cloud Demands a “Decoupled” Database
The article explores the challenges of managing databases within Kubernetes clusters, particularly in the context of the Inference Cloud, where AI-driven applications require efficient data access...
More from Atlassian Engineering
View Atlassian engineering blogs →Scaling Jira cloud Migrations, One Bottleneck at a Time
The article chronicles the Jira Migrations team's journey in scaling their migration platform from handling 20,000 to 50,000 Monthly Paid Enabled Users (PEUs). It discusses the transition from an...
How we catch and mitigate performance regressions at scale in Jira Cloud
The article discusses the complexities of detecting and mitigating performance regressions in Jira Cloud, a multi-tenant product. It highlights the challenges posed by diverse tenant configurations...
Get started on your work 30% faster with Rovo in Jira
The article discusses the implementation and analysis of Rovo, an AI tool integrated within Jira, aimed at enhancing user productivity. It presents a quasi-experimental study comparing two cohorts of...
How Rovo solves search challenges through entity linking
The article discusses how Atlassian addresses search challenges through advanced entity linking, transforming unstructured text into actionable knowledge. It highlights the importance of accurately...
How We Unlocked Performance at Scale with Jira Platform
The article discusses the significant rearchitecture of the Jira Cloud platform, transitioning from a single-tenant database to a cloud-native, multi-tenant architecture designed for scalability,...