Engineering posts about Alerting

Curated summaries and key learnings for engineers working with Alerting.

The article emphasizes the importance of using observability data to transition from reactive incident response to proactive reliability intelligence. It outlines how engineering teams can leverage...

Airbnb

Monitoring reliably at scale

The article outlines the challenges of maintaining reliable observability in systems that are heavily dependent on shared infrastructure, such as Kubernetes and service meshes. It highlights the...

Meta (Facebook)

Trust But Canary: Configuration Safety at Scale

In the Meta Tech Podcast episode featuring Pascal Hartig, the discussion revolves around the strategies employed by Meta's Configurations team to ensure safe configuration rollouts at scale. The...

Atlassian

14m

How we catch and mitigate performance regressions at scale in Jira Cloud

The article discusses the complexities of detecting and mitigating performance regressions in Jira Cloud, a multi-tenant product. It highlights the challenges posed by diverse tenant configurations...

Airbnb

It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb

The article outlines Airbnb's transformation of its Observability as Code (OaC) alert review process, which significantly reduced development cycles from weeks to minutes. By implementing a system...

Engineering posts about Alerting

Using observability data to prevent incidents

Monitoring reliably at scale

Trust But Canary: Configuration Safety at Scale

How we catch and mitigate performance regressions at scale in Jira Cloud

It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb