- Location
- Mumbai, Maharashtra, India
- Job type
- Full-time
Required skills
- banking
- capacity planning
- compliance
- incident response
- SRE
About the role
HDB Financial Services Ltd.
Website:
hdbfs.com
Job details:
- Lead , mentor and grow a team of 3 to 4 Site reliability engineers
- Mange Chaos engineering* and DR schedules
- Crisis management
- Define ,Implement and advocate Site reliability engineering (SRE) best practices like SLAs,SLOs,SLIs , error budget
- Validate Capacity planning
- Own observability stack for applications and Infrastructure (monitoring , alerting , logging , tracing back for root cause)
- Performance baselining and identifying bottlenecks , automate response wherever feasible
- Manage Incident response and problem management practice including rosters , on call rotation , runbooks , ruthless objective postmortems.
- Contribute to EA NFRs from performance perspective
- Will need engineering graduate with hands- on support , troubleshooting experience across Infra and application , logical , analytical approach , skill of corelating and elimination and good stakeholder communication.
- 12 to 15 years experience in demanding setup ( banking , ecommerce ), avoid small startup as process as candidates could be low on compliance and make things work any how/somehow.
- Team size 3 to 4 people to start with, additionally few resources from current Incident management team and even DR management team can roll into him.
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.