Director, Site Reliability (SRE, SLI/ SLO, Monitoring, Automation)

Vertafore

Location: Greater Hyderabad Area
Job type: Full-time

Required skills

AWS
Ansible
capacity planning
CloudFormation
cross-functional
DevOps
GitLab
incident response
information security
infrastructure-as-code
Jenkins
load balancing
SaaS
SRE
Terraform

About the role

Vertafore

Website: vertafore.com
Job details:
The Director, Site Reliability Engineering (SRE) will lead reliability, performance, and observability initiatives for a portfolio of Vertafore products. This role owns SLIs/SLOs, incident response, automation, and CI/CD practices for assigned product families. Directors will manage multiple teams and collaborate with Product Development, Cloud Operations, Information Security, and other SRE leaders to ensure operational excellence.

Key Responsibilities

Product Reliability Leadership
Define and enforce SLIs/SLOs for a subset of Vertafore flagship products.
Drive observability strategy across application and infrastructure layers.
Release Engineering & Automation
Oversee CI/CD pipelines for product deployments using tools like GitLab, Jenkins, Ansible, LaunchDarkly.
Implement Infrastructure-as-Code (Terraform, AWS CloudFormation/CDK) for application provisioning.
Incident Management
Define 24x7 on-call rotations for assigned products; ensure rapid resolution and blameless postmortems.
Cross-Functional Collaboration
Partner with Cloud Ops on capacity planning, OS patching (app tier), and load balancing (ALB, F5).
Align reliability goals with product roadmaps and customer SLAs.
Team Leadership
Manage a group of Managers and Engineers; mentor teams on automation, observability, and reliability best practices.

Qualifications

Bachelor’s degree in Computer Science, Information Systems, or related field.
18+ years in Software Engineering, SRE, DevOps, or reliability roles; 5+ years in leadership(Director).
Proven ability to leverage software engineering principles and practices to solve reliability and operational challenges.
Expertise in CI/CD, observability, and incident response.
Strong AWS knowledge and experience with container orchestration.
Proven ability to lead reliability programs across multiple SaaS products.
Experience architecting applications or infrastructure for highgrowth cloud platforms.
Experience in B2B SaaS environments involving large-scale distributed systems.
Proven leadership communicating and influencing at team, peer, and leadership levels.
Demonstrated experience driving operational excellence through metrics and KPIs.
(Preferred) Background supporting financial services, healthcare, or regulated industries.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.