- Location
- Mumbai Metropolitan Region
- Job type
- Full-time
Required skills
- Python
- backend
- communication skills
- compliance
- database
- GCP
- Golang
- JS
- MySQL
- Node
- NoSQL
- PostgreSQL
- Root Cause Analysis
- SaaS
- SRE
About the role
IDfy
Website:
idfy.com
Job details:
We are the Perfect match if you
- Have 10+ years of experience owning Production Supportfor cloud based SaaS products.You gain a deep understanding ofthe platform from both the technology and business perspective.
- Are well versed with ELKstack and troubleshoot production issues by traversing through logs, database, dashboards and messaging queues.
- Have a problem solving mindset and you go deep into the problems.You bring strong L2/L3-leveltechnical expertise and can independently troubleshoot, debug, and guide resolution for complex production issue
- Have strong communication skills and can collaborate with the Product, Engineering, Infrastructure , Infosec and Customer Satisfaction Team
- Have built and led an L2/L3 supportteam.You are a hands-on people manager who actively coaches,reviews work, and builds technical depth within the team—notjusttask-manages
- Are comfortable stepping in as a technical escalation point while also enabling and upskilling your team to reduce dependency overtime
Here’s What Your Day Would Look Like...
- Lead by example by getting a deep understanding ofthe platform,troubleshooting issues and guide and own resolution for complex L2/L3 productions issues which bubble up from our L1 team and CSM team.
- Be the gatekeeperforthe Production Environment and ensure the stability ofthe platform. Work closely with our platform team to understand the deployment process and various flavors of deployments.
- Investigate issues by reviewing dashboards, logs, and alerts across cloud-native systems, including Kubernetes-based deployments
- Own the root cause analysis and long-term fixes—notjust quick resolutions
- Coach team members through live problem-solving, code/config analysis, and incident retrospectives
- Balance team capacity, incidentload, and operational priorities to ensure sustainable delivery and minimal burnout
- Partner with Engineering and SRE teams on stability improvements, automation, and reliability initiatives
- Ensure documentation,runbooks, and SOPs are kept current and actually usable.
Tech StackYou’llGettoWorkOn
- Cloud-native architecture onAWS, GCP
- Kubernetes for container orchestration and deployment
- Backend services built using Python, Golang, Node.js, Elixir
- RESTfulAPIs and microservices-based systems
- Databases including PostgreSQL, MySQL, and NoSQL stores
- Messaging and async systems (queues, event-driven workflows)
- CI/CD pipelines and release automation
- Observability and monitoring tools (logs, metrics, alerts)
- Security, compliance, and reliability tooling aligned with enterprise standards
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.