How we rebuilt the search architecture for high availability in GitHub Enterprise Server
Read Full ArticleSummary
The article discusses the architectural improvements made to the search functionality in GitHub Enterprise Server to enhance high availability (HA). It highlights the transition from a clustered Elasticsearch setup to a more robust architecture utilizing Cross Cluster Replication (CCR). This change allows for independent single-node Elasticsearch clusters, improving data replication and reducing the risk of locked states during maintenance. The article outlines the challenges faced with previous Elasticsearch integrations, particularly in maintaining a leader/follower pattern, and details the new workflows developed to support the CCR feature, ensuring that critical data remains accessible and durable.
Key Learnings
- 1Understanding the limitations of clustered Elasticsearch setups in high availability scenarios.
- 2The importance of Cross Cluster Replication (CCR) in maintaining data integrity and availability.
- 3How to implement workflows for managing Elasticsearch index lifecycles in a high availability context.
- 4The trade-offs between using a clustered architecture versus independent single-node clusters.
- 5The necessity of custom solutions for failover and index management in distributed systems.
Who Should Read This
Senior Site Reliability Engineers (SREs) implementing high availability architectures for enterprise applications, particularly those utilizing Elasticsearch.
Test Your Knowledge
What are the primary challenges associated with using clustered Elasticsearch in a high availability setup?
How does Cross Cluster Replication (CCR) improve data management in GitHub Enterprise Server?
What design decisions led to the transition from a clustered architecture to independent single-node Elasticsearch clusters?
In what scenarios might a leader/follower pattern fail, and how does the new architecture mitigate these risks?
What custom workflows are necessary to manage Elasticsearch index lifecycles effectively in a high availability environment?
Topics
More articles about High Availability
Explore High Availability engineering →Scaling Jira cloud Migrations, One Bottleneck at a Time
The article chronicles the Jira Migrations team's journey in scaling their migration platform from handling 20,000 to 50,000 Monthly Paid Enabled Users (PEUs). It discusses the transition from an...
How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings
The article discusses how the Data 360 Compute Fabric team at Salesforce optimized Kubernetes scheduling to enhance resource efficiency and reduce costs. By evolving the default kube-scheduler...
Best Practices for High QPS Model Serving on Databricks
The article outlines best practices for achieving high queries per second (QPS) performance in model serving on Databricks. It emphasizes the importance of low latency and high throughput for...
My Journey to Airbnb — Anna Sulkina
Anna Sulkina's journey to Airbnb highlights her extensive experience in engineering, particularly in application and cloud infrastructure. She transitioned from hardware diagnostics to software...
Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters
The article discusses the implementation of backend aggregation (BAG) in Meta's Prometheus AI clusters, highlighting its role in interconnecting thousands of GPUs across multiple data centers. BAG...
More from GitHub Engineering
View GitHub engineering blogs →From pixels to characters: The engineering behind GitHub Copilot CLI’s animated ASCII banner
The article delves into the complexities of designing an animated ASCII banner for the GitHub Copilot CLI, highlighting the unique challenges posed by terminal environments. It discusses the...
When protections outlive their purpose: A lesson on managing defense systems at scale
The article outlines the challenges faced by GitHub in managing defense mechanisms that protect the platform from abuse while ensuring legitimate users are not adversely affected. It highlights the...
IssueOps: Automate CI/CD (and more!) with GitHub Issues and Actions
The article introduces IssueOps, a methodology that leverages GitHub Issues and Actions to automate repetitive tasks in software development, particularly in CI/CD workflows. It emphasizes the...
Introducing sub-issues: Enhancing issue management on GitHub
The article introduces sub-issues, a new feature on GitHub designed to enhance issue management by allowing users to break down larger tasks into smaller, manageable components. This hierarchical...
How the GitHub CLI can now enable triangular workflows
The article explores the recent enhancements in the GitHub CLI that facilitate triangular workflows, which allow developers to pull changes from different branches without the need for constant...