Jade Global
Website:
jadeglobal.com
Job details:
Senior Software Engineer / Site Reliability Engineer (SRE) – Observability & Platform Engineering1
Must-Have Skills (Required)
Core Engineering & Platform Skills
- Strong proficiency in at least one of the following: Python, JavaScript (Node.js), or Java
- Hands-on experience with API integrations (designing, consuming, and integrating APIs)
- Strong experience working in Kubernetes environments, including deployment, operations, and monitoring
Observability & Monitoring
- Experience with DataDog (preferred) or similar tools such as Prometheus, Grafana
- Ability to configure dashboards, alerts, and APM (tracing, metrics, logging)
- Experience monitoring containerized and microservices architectures
Cloud & Infrastructure
- Hands-on experience with AWS
- Experience integrating observability tools into cloud environments
SRE & Operations
- Experience with CI/CD integrations for observability (e.g., DataDog in pipelines)
- Ability to automate monitoring and operational tasks using scripting (Python preferred)
Strongly Preferred Skills
- Experience owning and operating an internal engineering platform
- Deep experience with observability platforms
- Demonstrated ownership of reliability, scalability, and performance
- Proven ability to proactively lead maintenance efforts and platform improvements
- Experience installing and configuring DataDog agents and integrations
- Experience managing API keys and secure configurations
- Experience managing user roles and access controls within observability platforms
Nice-to-Have Skills (Preferred)
- Familiarity with Go (Golang)
- Experience with additional observability tools such as New Relic, Dynatrace, Elastic, or Splunk Observability
Description
Project Overview:
We are seeking a Senior Software Engineer / SRE with an Observability focus to support platform reliability, monitoring, and modernization initiatives. This role blends software engineering (60–70%) with site reliability engineering (30–40%), with a strong emphasis on Kubernetes and observability platforms.
Key Responsibilities
- Support platform reliability, monitoring, and modernization initiatives
- Provide operational and training support for DataDog, the Observability Platform for R&D
- Enhance observability, reliability, and performance across engineering platforms
- Drive automation and operational excellence for monitoring and alerting frameworks
- Support Kubernetes-based platform operations and monitoring integrations
Timezone Coverage
Click on Apply to know more.