Website:
neurogent.ai
Job details:
About the role
We're looking for a software engineer who wants to move beyond just writing features - someone who gets excited about distributed systems, loves digging into a gnarly incident and fixing it at the root, and wants to build the automation and observability tooling that prevents it from ever happening again.
You'll write code. You'll fix defects. You'll be shaping CI/CD pipelines, building self-healing automation, defining SLOs, and working directly with cloud infrastructure. If you've been a solid software engineer and want your next role to be a launchpad into SRE, DevOps, or cloud engineering - this is it.
What you'll do
Engineering & automation
- Write, debug, and deploy code fixes in Python, C#, or Java directly in production - not just raise tickets.
- Build automation that eliminates operational toil: self-service tooling, auto-remediation scripts, and self-healing workflows.
- Contribute to and maintain CI/CD pipelines, making deployments faster, safer, and more reliable.
- Use AI-assisted development tools to write smarter, faster automation and accelerate your own DevOps workflows.
Observability & reliability
- Build and own the observability layer - dashboards, alerts, log queries, and distributed tracing - using tools like Dynatrace, Datadog, or similar platforms.
- Reduce alert noise and improve signal quality so the team acts on what matters.
- Define SLOs and SLIs for critical services and drive engineering improvements based on error budget burn.
- Analyse logs, metrics, and traces to investigate performance and availability issues at the system level.
Incident ownership & continuous improvement
- Lead end-to-end incident response: from detection and triage to root cause analysis and long-term fix.
- Run post-incident reviews and translate learnings into architectural or process improvements.
- Identify recurring patterns and drive problem management - not just fix symptoms.
- Collaborate with product engineering, QA, and infrastructure teams to ship reliability improvements.
Documentation & knowledge
- Write runbooks, SOPs, and post-mortems that are actually useful - concise, actionable, and maintained.
- Document automation solutions and contribute to a shared engineering knowledge base.
What we're looking for
Must-haves
- 3+ years of experience in software development, application engineering, or a related technical role.
- Strong programming skills in at least one of: Python, C#, or Java - and comfort reading code you didn't write.
- Solid SQL skills for data analysis and troubleshooting production issues.
- Experience with scripting for automation - Python, Bash, or PowerShell.
- Familiarity with Git and standard version control workflows.
- Comfortable working in Linux environments and using command-line tools.
- Strong analytical mindset: you dig for root causes, not just symptom fixes.
- Basic understanding of CI/CD pipelines and deployment processes.
Nice to have
SLOs / SLIs / error budgets Dynatrace / Datadog / Splunk AWS / Azure / GCP Docker / Kubernetes Terraform or IaC GitHub Actions / Jenkins AI dev tools (Claude Code, Cursor) ITIL basics Jira / ServiceDesk+
Education
Bachelor's or Master’s degree in Computer Science, Information Technology, or a related field - or equivalent hands-on engineering experience.
Work details
Location: Hybrid (Gurgaon)
Hours: 5:30 PM – 2:30 AM IST
Shift alignment: US client hours
Type: Full-Time
Click on Apply to know more.