How we rebuilt the search architecture for high availability in GitHub Enterprise Server

Summary

The article discusses the architectural improvements made to the search functionality in GitHub Enterprise Server to enhance high availability (HA). It highlights the transition from a clustered Elasticsearch setup to a more robust architecture utilizing Cross Cluster Replication (CCR). This change allows for independent single-node Elasticsearch clusters, improving data replication and reducing the risk of locked states during maintenance. The article outlines the challenges faced with previous Elasticsearch integrations, particularly in maintaining a leader/follower pattern, and details the new workflows developed to support the CCR feature, ensuring that critical data remains accessible and durable.

Key Learnings

1Understanding the limitations of clustered Elasticsearch setups in high availability scenarios.
2The importance of Cross Cluster Replication (CCR) in maintaining data integrity and availability.
3How to implement workflows for managing Elasticsearch index lifecycles in a high availability context.
4The trade-offs between using a clustered architecture versus independent single-node clusters.
5The necessity of custom solutions for failover and index management in distributed systems.

Who Should Read This

Senior Site Reliability Engineers (SREs) implementing high availability architectures for enterprise applications, particularly those utilizing Elasticsearch.

Test Your Knowledge

What are the primary challenges associated with using clustered Elasticsearch in a high availability setup?

How does Cross Cluster Replication (CCR) improve data management in GitHub Enterprise Server?

What design decisions led to the transition from a clustered architecture to independent single-node Elasticsearch clusters?

In what scenarios might a leader/follower pattern fail, and how does the new architecture mitigate these risks?

What custom workflows are necessary to manage Elasticsearch index lifecycles effectively in a high availability environment?

Topics

High Availability Replication Leader Election Failover Service Discovery

Read Full Article at GitHub

How we rebuilt the search architecture for high availability in GitHub Enterprise Server

Summary

Key Learnings

Who Should Read This

Test Your Knowledge

Topics

More articles about High Availability

Scaling Jira cloud Migrations, One Bottleneck at a Time

How Data 360 Optimized Kubernetes Scheduling Architecture, Delivering 13% Cost Savings

Best Practices for High QPS Model Serving on Databricks

My Journey to Airbnb — Anna Sulkina

Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters

More from GitHub Engineering

From pixels to characters: The engineering behind GitHub Copilot CLI’s animated ASCII banner

When protections outlive their purpose: A lesson on managing defense systems at scale

IssueOps: Automate CI/CD (and more!) with GitHub Issues and Actions

Introducing sub-issues: Enhancing issue management on GitHub

How the GitHub CLI can now enable triangular workflows

Related topics