How we catch and mitigate performance regressions at scale in Jira Cloud
Read Full ArticleSummary
The article discusses the complexities of detecting and mitigating performance regressions in Jira Cloud, a multi-tenant product. It highlights the challenges posed by diverse tenant configurations and traffic patterns, which can obscure performance issues in aggregated metrics. The authors describe their approach of implementing a per-tenant, per-endpoint detection system integrated with Rovo Dev CLI, enabling precise identification of regressions. The system employs statistical process control techniques to alert teams of performance issues and utilizes an automated root cause analysis agent to diagnose the underlying causes of regressions, significantly improving response times and reducing customer impact.
Key Learnings
- 1Implementing per-tenant metrics is crucial for accurately detecting performance regressions in multi-tenant applications.
- 2Statistical process control techniques can enhance alert quality by filtering out noise from outlier traffic.
- 3Automated root cause analysis can drastically reduce the time required to diagnose performance issues, enabling quicker mitigations.
- 4Understanding the unique data and traffic patterns of each tenant is essential for effective performance monitoring.
- 5Continuous improvement of tools and processes for regression detection is necessary to adapt to the evolving nature of software development.
Who Should Read This
Senior DevOps Engineers implementing performance monitoring solutions for large-scale multi-tenant applications.
Test Your Knowledge
What are the trade-offs between using aggregated metrics versus per-tenant metrics in performance monitoring?
How does the implementation of statistical process control techniques improve alerting for performance regressions?
What challenges might arise when diagnosing performance regressions in a multi-tenant environment?
Why is automated root cause analysis critical in the context of high-frequency deployments and numerous feature flag changes?
How can the unique characteristics of tenant data shapes influence the performance of a multi-tenant application?
Topics
More articles about Alerting
Explore Alerting engineering →It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb
The article outlines Airbnb's transformation of its Observability as Code (OaC) alert review process, which significantly reduced development cycles from weeks to minutes. By implementing a system...
See More, Worry Less: Managed Database Observability, Monitoring, and Hardening Advancements
The article outlines recent enhancements in DigitalOcean's Managed Database service, focusing on observability and security improvements. Key advancements include the integration with Datadog for...
More from Atlassian Engineering
View Atlassian engineering blogs →Scaling Jira cloud Migrations, One Bottleneck at a Time
The article chronicles the Jira Migrations team's journey in scaling their migration platform from handling 20,000 to 50,000 Monthly Paid Enabled Users (PEUs). It discusses the transition from an...
Get started on your work 30% faster with Rovo in Jira
The article discusses the implementation and analysis of Rovo, an AI tool integrated within Jira, aimed at enhancing user productivity. It presents a quasi-experimental study comparing two cohorts of...
How Rovo solves search challenges through entity linking
The article discusses how Atlassian addresses search challenges through advanced entity linking, transforming unstructured text into actionable knowledge. It highlights the importance of accurately...
How We Unlocked Performance at Scale with Jira Platform
The article discusses the significant rearchitecture of the Jira Cloud platform, transitioning from a single-tenant database to a cloud-native, multi-tenant architecture designed for scalability,...
Mobbing with AI
The article explores the integration of AI tools into mob programming to enhance software development efficiency without sacrificing code quality. It details a collaborative process where teams...