Flag job

Report

Staff Software Engineer - Alerting Platform

Min Experience

2 years

Location

Spain, Remote, Germany, Remote, Ireland, Remote, Italy, Remote, Sweden, Remote, France, Remote, Israel, Remote

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

We are looking for a Staff Engineer to help us scale Datadog's Alerting Platform, which is responsible for the core systems that define and schedule monitors, create alerts, and ensure the accuracy and timeliness of the end to end alerting process across the platform. This is a unique opportunity to contribute to one of the most critical platforms at Datadog. Customers can configure monitors and generate alerts for virtually every product in our unified platform. It's imperative that we maintain our customers' trust by delivering these notifications reliably. In practice, this means the alerting platform has to be the most reliable platform. As we grow we have to design systems that can degrade furthermore while still ensuring the best customer experience without breaking. This staff engineer will focus on two critical components: the alerting scheduler, responsible for scheduling the timely evaluation of millions of monitors each minute, and the state processor that makes the critical decision about when a transition in monitor state has occurred. These distributed systems are tied together, one being the consumer (state machine) of the other (scheduler). The reliability and fault tolerance of these systems together, and across the entire alerting platform, is critical to Datadog's customer trust and business success. Upcoming initiatives to achieve an order of magnitude increase in reliability will require deep changes to these complex systems.

About the company

Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers' entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional development, diversity of thought, innovation, and work excellence to empower continuous growth. Join the pack and become part of a collaborative, pragmatic, and thoughtful people-first community where we solve tough problems, take smart risks, and celebrate one another.

Skills

backend programming
distributed systems
reliability
architecture