Website:
callquestsolution.com
Job details:
Job Summary
We are looking for a Site Reliability Engineer to improve system reliability, scalability, and operational efficiency across cloud-based platforms. The ideal candidate will have strong expertise in automation, monitoring, infrastructure engineering, and incident management.
Key Responsibilities
- Build and maintain scalable infrastructure.
- Improve platform reliability and uptime.
- Automate operational workflows.
- Implement monitoring and alerting systems.
- Support incident response and root cause analysis.
- Collaborate with development and DevOps teams.
- Optimize cloud performance and cost.
- Develop reliability metrics and SLAs.
- Support disaster recovery initiatives.
- Maintain operational documentation.
Required Skills
- Experience with Kubernetes and Docker.
- Strong Linux administration skills.
- Expertise in monitoring tools.
- Knowledge of scripting languages.
- Familiarity with cloud platforms.
- Understanding of networking concepts.
- Experience with CI/CD pipelines.
Click on Apply to know more.