- Location
- Pune City, Maharashtra, India
- Job type
- Full-time
Required skills
- Python
- Assembly
- Azure
- backend
- compliance
- microservices
- regression
- task automation
- uptime
- user feedback
- Vertex
About the role
Website:
neurealm.com
Job details:
About the Role
What success looks like:
- Ship AI capabilities that measurably improve user outcomes (quality, time saved, throughput)
- Build systems that are reliable by design: evals, observability, safety, and cost/latency controls from day one
- Iterate quickly using a tight loop of instrument → evaluate → improve → deploy
Responsibilities
- Agentic AI Feature & Workflow Development
- Build and integrate AI-driven features using LLM APIs (OpenAI / Azure OpenAI, Anthropic, Gemini on Vertex AI)
- Design and implement tool-using agents (structured function calling, schema validation, retries, fallbacks)
- Build multi-agent workflows when appropriate (e.g., planner/worker, reviewer/critic, specialist routing) and know when a simpler architecture is better
- Create agentic workflows such as document understanding, extraction, reasoning over evidence, task automation, and multi-step decision support
- Own context engineering end-to-end:
- dynamic context assembly (retrieval + state + tool outputs)
- context budgeting and compression/summarization
- grounding strategies to reduce hallucinations and improve consistency
- Implement retrieval-augmented generation (RAG) and search workflows using off-the-shelf vector stores and embedding services
- Evaluation, Quality & Iteration (Core)
- Establish evaluation frameworks for accuracy, reliability, and output quality
- Build task-specific eval suites: golden datasets, adversarial cases, regression tests, and rubric-based scoring
- Set up automated evaluation pipelines and release gates (CI/CD-friendly) tied to prompt/model/version changes
- Define and monitor online metrics (e.g., task success rate, human override rate, safety flags, latency, cost) and run experiments/A-B tests where appropriate
- Use LLM-as-judge responsibly: calibrate, validate, and pair with human labels when needed
- Engineering, Integration & Observability
- Develop scalable backend services and APIs that incorporate AI functionality
- Integrate AI pipelines into existing cloud, microservices, and event-driven architectures
- Implement observability and analytics for all AI features (tracing, evaluations, prompt versioning, cost tracking) Example tooling: Langfuse (and/or OpenTelemetry-compatible stacks)
- Ensure reliability, uptime, performance, and security of AI services
- Build internal tooling for evaluation, testing, prompt/version management, and safe deployment
- Product & Collaboration
- Partner with product managers, designers, the Chief Architect, and domain SMEs to shape AI-first solutions
- Rapidly prototype concepts and iterate based on user feedback and measurable eval results
- Translate business problems into well-structured AI workflows without requiring ML model training
- Document system behavior, known failure modes, and operational playbooks
- Governance & Safety
- Implement guardrails, checks, and fallback logic for safe and predictable AI behavior
- Help define and follow compliance, privacy, and responsible AI guidelines
- Design for safe tool execution (bounded actions, permissions, escalation paths, human-in the-loop review)
Qualifications
- Strong software engineering background (Python preferred) and experience shipping backend services
Required Skills
- Strong software engineering background (Python preferred)
Preferred Skills
- Experience with AI-driven features and workflows
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.