Jobgether
Website:
jobgether.com
Job details:
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Staff Site Reliability Engineer in India.
In this high-impact role, you will help define and scale reliability standards for mission-critical SaaS platforms used by global enterprise customers. You will work at the intersection of engineering, infrastructure, and operations, ensuring systems remain highly available, performant, and resilient at scale. The role goes beyond traditional operations, focusing on building automated, self-healing systems and reliability platforms. You will lead incident response for critical outages while driving long-term architectural improvements. Acting as a technical force multiplier, you will influence engineering practices across teams and mentor engineers on SRE excellence. This is a highly collaborative environment where reliability, observability, and automation are core engineering principles. You will play a central role in enabling 99.99% uptime objectives in a fast-scaling global organization.
Accountabilities
- Lead the design and implementation of reliability frameworks and self-service platforms that enable engineering teams to own the stability of their services, including “You Build It, You Run It” models.
- Act as Incident Commander during high-severity incidents, coordinating cross-functional response efforts, ensuring rapid resolution, and driving effective blameless postmortems.
- Architect and enhance observability solutions using modern tooling such as Prometheus, Grafana, and OpenTelemetry to improve detection and performance insights.
- Drive automation and AIOps initiatives to enable proactive detection, diagnostics, and remediation of system failures.
- Establish reliability engineering best practices across teams through production readiness reviews, design collaboration, and operational standards.
- Mentor engineers across SRE and product teams, strengthening technical capabilities and promoting a culture of operational excellence.
- Continuously improve system scalability, resilience, and performance across distributed cloud environments.
Requirements
The ideal candidate brings extensive experience in Site Reliability Engineering or DevOps within high-scale SaaS environments. You should have strong programming or scripting skills in languages such as Python, Go, or similar, and hands-on experience with cloud infrastructure such as Amazon Web Services, Google Cloud Platform, or Microsoft Azure. Deep understanding of distributed systems, networking fundamentals (TCP/IP, DNS, HTTP/S), and Kubernetes is essential. You should have proven experience managing production incidents, leading postmortems, and improving system reliability. Strong observability experience with logging, monitoring, and tracing tools is expected. Excellent communication skills, problem-solving ability, and calm decision-making under pressure are critical. Prior experience mentoring engineers and influencing architecture decisions in complex environments is highly valued.
Benefits
- Competitive compensation aligned with senior-level impact and market benchmarks
- Fully remote-friendly and flexible work arrangements
- Opportunity to work on large-scale, globally distributed systems
- Strong focus on learning, mentorship, and technical growth
- Exposure to cutting-edge SRE, observability, and automation practices
- Inclusive and collaborative engineering culture
- Health, wellness, and employee support programs (varies by location).
How Jobgether Works
We use an
AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Click on Apply to know more.