Website:
epitria.com
Job details:
We are looking for a Senior Agentic AI & Data Science Engineer with a deep product engineering background to architect, develop, deploy, and operate production-grade AI systems .
The role requires end-to-end ownership of AI products—covering agent design, ML modeling, system architecture, MLOps, multi-cloud deployment, security, and scalability . The ideal candidate combines strong AI research intuition with real-world engineering excellence .
7–10 years total experience in Data Science, AI/ML Engineering, and Product Engineering
Strong hands-on experience in building, deploying, and scaling Agentic AI systems in production
Location - Bengaluru
Work Timings - 12 PM - 9 PM IST
Salary Range: INR 40-41 LPA
Core Responsibilities
Agentic AI & LLM Systems
- Design, implement, and optimize Agentic AI architectures involving planning, reasoning, memory, tool-use, and orchestration.
- Build and manage multi-agent systems for complex workflows, automation, and decision intelligence.
- Implement Retrieval-Augmented Generation (RAG) pipelines with structured and unstructured data sources.
- Integrate AI agents with enterprise APIs, databases, SaaS platforms, and internal tools .
- Develop robust prompt strategies, agent workflows, fallback mechanisms, and evaluation pipelines.
- Deploy and operate LLM-based systems with cost, latency, reliability, and safety considerations.
Data Science & Machine Learning
- Build, train, evaluate, and deploy ML/DL models across NLP, structured data, time-series, recommendation, and predictive analytics.
- Perform data exploration, feature engineering, statistical analysis, and hypothesis testing .
- Design scalable training pipelines , experiment tracking, and model versioning.
- Monitor model performance, drift, bias, and data quality in production environments.
- Apply explainability and interpretability techniques where required.
Product Engineering & System Design
- Own the full AI product lifecycle : problem definition → design → development → deployment → monitoring → iteration.
- Translate business and product requirements into scalable, modular, and maintainable AI solutions .
- Design distributed, fault-tolerant, and extensible architectures for AI platforms.
- Collaborate closely with product managers, UX, backend, frontend, and platform teams .
- Enforce engineering best practices including code quality, testing, documentation, and performance optimization .
Multi-Cloud & Infrastructure Engineering
- Design, deploy, and operate AI systems across AWS, Azure, and GCP (multi-cloud or hybrid).
- Use Docker, Kubernetes, Helm , and cloud-native services for scalable deployments.
- Implement Infrastructure as Code (IaC) using Terraform / CloudFormation.
- Leverage managed AI/ML services where appropriate (SageMaker, Vertex AI, Azure ML).
- Optimize cloud resource utilization and cost across environments.
Security, Governance & Reliability
- Ensure data security, privacy, and compliance across AI systems.
- Implement secure access control, secrets management, and encrypted data pipelines.
- Apply Responsible AI practices : bias detection, fairness, explainability, auditability.
- Design systems for high availability, disaster recovery, and fault tolerance .
- Establish governance standards for models, data, and AI agents.
Technical Leadership & Collaboration
- Provide technical guidance and mentorship to junior engineers and data scientists.
- Lead architecture discussions, technical reviews, and best-practice adoption.
- Drive innovation in AI/Agentic systems aligned with product and business goals.
- Communicate complex technical concepts clearly to both technical and non-technical stakeholders.
Cloud, DevOps & MLOps
- Strong hands-on experience with AWS, Azure, and/or GCP (at least two preferred)
- Docker, Kubernetes, Helm
- CI/CD: GitHub Actions, GitLab CI, Jenkins
- MLOps tools: MLflow, Kubeflow , cloud-native ML platforms
- Monitoring and observability tools
Architecture & Distributed Systems
- Distributed systems and event-driven architectures
- Asynchronous processing and workflow orchestration
- Scalability, reliability, and performance engineering
Click on Apply to know more.