Engineering posts about Failover
Curated summaries and key learnings for engineers working with Failover.
Databricks
5m
Zero-Downtime Patching in Lakebase Part 1: Prewarming
The article discusses the challenges associated with planned maintenance in database systems, particularly focusing on the performance degradation caused by cold restarts. It introduces Lakebase's...
GitHub
6m
How we rebuilt the search architecture for high availability in GitHub Enterprise Server
The article discusses the architectural improvements made to the search functionality in GitHub Enterprise Server to enhance high availability (HA). It highlights the transition from a clustered...
Meta (Facebook)
5m
Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters
The article discusses the implementation of backend aggregation (BAG) in Meta's Prometheus AI clusters, highlighting its role in interconnecting thousands of GPUs across multiple data centers. BAG...