Atlassian
14 min read

How we catch and mitigate performance regressions at scale in Jira Cloud

Read Full Article

Summary

The article discusses the complexities of detecting and mitigating performance regressions in Jira Cloud, a multi-tenant product. It highlights the challenges posed by diverse tenant configurations and traffic patterns, which can obscure performance issues in aggregated metrics. The authors describe their approach of implementing a per-tenant, per-endpoint detection system integrated with Rovo Dev CLI, enabling precise identification of regressions. The system employs statistical process control techniques to alert teams of performance issues and utilizes an automated root cause analysis agent to diagnose the underlying causes of regressions, significantly improving response times and reducing customer impact.

Key Learnings

  • 1Implementing per-tenant metrics is crucial for accurately detecting performance regressions in multi-tenant applications.
  • 2Statistical process control techniques can enhance alert quality by filtering out noise from outlier traffic.
  • 3Automated root cause analysis can drastically reduce the time required to diagnose performance issues, enabling quicker mitigations.
  • 4Understanding the unique data and traffic patterns of each tenant is essential for effective performance monitoring.
  • 5Continuous improvement of tools and processes for regression detection is necessary to adapt to the evolving nature of software development.

Who Should Read This

Senior DevOps Engineers implementing performance monitoring solutions for large-scale multi-tenant applications.

Test Your Knowledge

?

What are the trade-offs between using aggregated metrics versus per-tenant metrics in performance monitoring?

?

How does the implementation of statistical process control techniques improve alerting for performance regressions?

?

What challenges might arise when diagnosing performance regressions in a multi-tenant environment?

?

Why is automated root cause analysis critical in the context of high-frequency deployments and numerous feature flag changes?

?

How can the unique characteristics of tenant data shapes influence the performance of a multi-tenant application?

Topics

Read Full Article at Atlassian