Vertafore
Website:
vertafore.com
Job details:
The Director, Site Reliability Engineering (SRE) will lead reliability, performance, and observability initiatives for a portfolio of Vertafore products. This role owns SLIs/SLOs, incident response, automation, and CI/CD practices for assigned product families. Directors will manage multiple teams and collaborate with Product Development, Cloud Operations, Information Security, and other SRE leaders to ensure operational excellence.
Key Responsibilities
- Product Reliability Leadership
- Define and enforce SLIs/SLOs for a subset of Vertafore flagship products.
- Drive observability strategy across application and infrastructure layers.
- Release Engineering & Automation
- Oversee CI/CD pipelines for product deployments using tools like GitLab, Jenkins, Ansible, LaunchDarkly.
- Implement Infrastructure-as-Code (Terraform, AWS CloudFormation/CDK) for application provisioning.
- Incident Management
- Define 24x7 on-call rotations for assigned products; ensure rapid resolution and blameless postmortems.
- Cross-Functional Collaboration
- Partner with Cloud Ops on capacity planning, OS patching (app tier), and load balancing (ALB, F5).
- Align reliability goals with product roadmaps and customer SLAs.
- Team Leadership
- Manage a group of Managers and Engineers; mentor teams on automation, observability, and reliability best practices.
Qualifications
- Bachelor’s degree in Computer Science, Information Systems, or related field.
- 18+ years in Software Engineering, SRE, DevOps, or reliability roles; 5+ years in leadership(Director).
- Proven ability to leverage software engineering principles and practices to solve reliability and operational challenges.
- Expertise in CI/CD, observability, and incident response.
- Strong AWS knowledge and experience with container orchestration.
- Proven ability to lead reliability programs across multiple SaaS products.
- Experience architecting applications or infrastructure for highgrowth cloud platforms.
- Experience in B2B SaaS environments involving large-scale distributed systems.
- Proven leadership communicating and influencing at team, peer, and leadership levels.
- Demonstrated experience driving operational excellence through metrics and KPIs.
- (Preferred) Background supporting financial services, healthcare, or regulated industries.
Click on Apply to know more.