CoffeeBeans
Website:
coffeebeans.io
Job details:
Experience - 3-5 years
Location - Hyderabad
Mode of work - Onsite (5 days WFO)
Role Overview
We are looking for skilled engineers to design and build automation agents that significantly reduce manual effort in L1/L2 operations for a high-scale UPI SRE team.
This role focuses on developing resilient, intelligent systems capable of handling incident triage, routine operational checks, and workflow automation across bare metal infrastructure. The systems will integrate closely with ticketing and CRM platforms such as Zoho CRM and BMC Remedy.
Key Responsibilities
- Design and develop automation agents to streamline L1/L2 operational tasks
- Build intelligent systems for incident triage, alert correlation, and workflow automation
- Develop scripts and tools to automate infrastructure checks, restarts, and configurations
- Integrate automation solutions with Zoho CRM and BMC Remedy
- Collaborate with SRE teams to improve system reliability and reduce manual intervention
- Implement observability and monitoring solutions using Prometheus, Grafana, and ELK
- Contribute to CI/CD pipelines for deploying and validating automation agents
- Ensure security best practices including RBAC, patching, and vulnerability scanning
Required Skills
- Strong programming experience in Python, Go, or Java
- Hands-on experience with Linux systems and shell scripting
- Experience in automation tools such as Ansible, Jenkins, or custom scripting
- Knowledge of observability tools like Prometheus, Grafana, ELK
- Understanding of incident management and ITIL workflows
- Familiarity with ticketing systems (BMC Remedy, Zoho CRM, or similar)
- Basic understanding of AI/ML concepts such as NLP, anomaly detection, or predictive modeling
Good to Have
- Experience with Terraform or advanced orchestration frameworks
- Exposure to AIOps or ML Ops practices
- Knowledge of advanced tracing tools like Jaeger or OpenTelemetry
- Familiarity with security frameworks and compliance standards (ISO 27001, PCI DSS)
- Experience in building chatbots or automation assistants
Sample Use Cases You Will Work On
- Automating duplicate ticket detection and closure in ticketing systems
- Building NLP-based ticket classification and routing systems
- Developing proactive monitoring agents for infrastructure health checks
- Implementing anomaly detection models for system and transaction data
- Reducing alert noise through intelligent alert correlation
- Creating predictive systems for failure forecasting and preventive actions
- Developing chatbot assistants for operational support
- Automating CI/CD pipelines for deployment of automation tools
- Enhancing security through automated vulnerability scanning and audit logging
Why Join
- Opportunity to work on national-scale fintech infrastructure (UPI ecosystem)
- Build cutting-edge AI-driven automation systems (AIOps)
- Collaborate with a highly skilled SRE team
- Strong learning and growth opportunities in SRE, AI Ops, and platform engineering
Click on Apply to know more.