Scaling Jira cloud Migrations, One Bottleneck at a Time
Read Full ArticleSummary
The article chronicles the Jira Migrations team's journey in scaling their migration platform from handling 20,000 to 50,000 Monthly Paid Enabled Users (PEUs). It discusses the transition from an API-driven architecture to a Kafka-based ETL model, highlighting the challenges faced, such as API timeouts and database lock contentions. The team implemented a 'pull-based' model to enhance throughput and avoid overloading the target system. They also optimized various aspects of the migration process, including worker node configurations, polling timeouts, and entity processing strategies, ultimately achieving a significant increase in migration throughput and reliability for large-scale customers.
Key Learnings
- 1Transitioning from a push-based to a pull-based architecture can significantly improve system throughput and reduce bottlenecks.
- 2Optimizing worker node configurations and autoscaling rules is critical for maintaining high throughput during migrations.
- 3Addressing misconfigurations in timeout settings can lead to immediate performance improvements in data processing.
- 4Implementing micro-batching and per-entity parallel processing can enhance efficiency and reduce network overhead.
- 5Understanding the distribution of project sizes is essential for optimizing concurrency and resource allocation during migrations.
Who Should Read This
Senior Software Engineers specializing in distributed systems and data migration strategies, particularly those involved in scaling cloud-based applications.
Test Your Knowledge
What are the trade-offs between a push-based and a pull-based migration architecture in terms of throughput and system load?
How did the team identify and resolve the issue of database lock contention during the migration process?
What specific metrics were used to benchmark the performance of the new migration architecture compared to the old one?
In what ways did the team ensure that the migration system could handle the increased concurrency required for 50K-scale migrations?
What lessons were learned from the initial performance benchmarks that informed subsequent architectural decisions?
Topics
More articles about Backpressure
Explore Backpressure engineering →From Static Rate Limiting to Adaptive Traffic Management in Airbnb’s Key-Value Store
The article explores the evolution of Airbnb's key-value store, Mussel, from static rate limiting to an adaptive traffic management system designed to handle varying traffic patterns and ensure high...
Behind the Streams: Real-Time Recommendations for Live Events Part 3
The article details Netflix's engineering approach to delivering real-time recommendations for live events, highlighting the unique challenges posed by simultaneous viewership demands. It describes a...
More from Atlassian Engineering
View Atlassian engineering blogs →How we catch and mitigate performance regressions at scale in Jira Cloud
The article discusses the complexities of detecting and mitigating performance regressions in Jira Cloud, a multi-tenant product. It highlights the challenges posed by diverse tenant configurations...
Get started on your work 30% faster with Rovo in Jira
The article discusses the implementation and analysis of Rovo, an AI tool integrated within Jira, aimed at enhancing user productivity. It presents a quasi-experimental study comparing two cohorts of...
How Rovo solves search challenges through entity linking
The article discusses how Atlassian addresses search challenges through advanced entity linking, transforming unstructured text into actionable knowledge. It highlights the importance of accurately...
How We Unlocked Performance at Scale with Jira Platform
The article discusses the significant rearchitecture of the Jira Cloud platform, transitioning from a single-tenant database to a cloud-native, multi-tenant architecture designed for scalability,...
Mobbing with AI
The article explores the integration of AI tools into mob programming to enhance software development efficiency without sacrificing code quality. It details a collaborative process where teams...