AgileEngine
Website:
agileengine.com
Job details:
AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for a
Senior Site Reliability Engineering to strengthen our platform reliability and observability capabilities. You will own the design and operation of monitoring infrastructure — including Datadog APM, alerting, and distributed tracing — across Kubernetes-based microservices on AWS. The role spans backend engineering and SRE practice in roughly a 65/35 split, with direct involvement in CI/CD integration and observability automation. You will also support internal teams in adopting monitoring best practices as we modernize our R&D platform.
WHAT YOU WILL DO
- Design, build, and maintain scalable backend and platform components;
- Implement and manage observability solutions across distributed systems;
- Configure dashboards, alerts, and APM for tracing, metrics, and logging;
- Monitor and improve system reliability, scalability, and performance;
- Deploy, operate, and maintain services in Kubernetes environments;
- Integrate observability tools into CI/CD pipelines and cloud infrastructure;
- Automate monitoring and operational workflows using scripting;
- Provide operational and training support for observability platforms, especially Datadog;
- Collaborate with engineering teams to improve system visibility and reliability practices.
MUST HAVES
-
4+ years of experience with
Python, Node.js, or Java;
- Hands-on experience with
API integrations;
- Strong experience in
Kubernetes environments;
- Experience with
Datadog or similar tools such as
Prometheus and Grafana;
- Ability to configure dashboards, alerts, and APM;
- Experience monitoring containerized and microservices architectures;
- Hands-on experience with
AWS;
- Experience integrating observability tools into cloud environments;
- Experience with
CI/CD integrations for observability;
- Ability to automate monitoring and operational tasks using scripting;
- Upper-intermediate English level.
NICE TO HAVES
- Experience owning and operating an internal engineering platform, especially observability platforms;
- Demonstrated ownership of reliability, scalability, and performance;
- Ability to proactively lead maintenance and platform improvements;
- Experience installing and configuring Datadog agents and integrations;
- Experience managing API keys and secure configurations;
- Experience managing user roles and access controls;
- Familiarity with Go (Golang);
- Experience with additional observability tools such as New Relic, Dynatrace, Elastic Stack, or Splunk.
PERKS AND BENEFITS
-
Remote work & Local connection: Work where you feel most productive and connect with your team in periodic meet-ups to strengthen your network and connect with other top experts.
-
Legal presence in India: We ensure full local compliance with a structured, secure work environment tailored to Indian regulations.
-
Competitive Compensation in INR: Fair compensation in INR with dedicated budgets for your personal growth, education, and wellness.
-
Innovative Projects: Leverage the latest tech and create cutting-edge solutions for world-recognized clients and the hottest startups.
Click on Apply to know more.