Shine.com
Website:
shine.com
Job details:
AI PRODUCTION ENGINEER
We are looking for a Senior AI Production Engineer with 5+ years of experience, who can build AI/ML models and manage the deployment, monitoring, and lifecycle of models in production. The role requires strong experience in LLMs, RAG architectures, observability principles, cloud platforms, automation, and MLOps practices to ensure scalable and reliable AI solutions.
Key Responsibilities
Design and develop AI / Machine Learning models including LLM pipelines, RAG, Agent-based systems and Vector database integration.
Understand and optimize latency across RAG and model calls.
Deploy AI models into production using MLOps frameworks and CI/CD pipelines.
Optimize models for scalability, reliability, and performance.
Experience in incident response and troubleshooting for AI systems, including resolving slow RAG performance, poor retrieval quality, API rate-limit issues, vector store indexing failures, and agent infinite loops.
Ability to diagnose and manage cost-related issues, including unexpected spend spikes caused by inefficient prompts, retries, or misconfigured AI workloads.
Collaborate with Developers, DevOps, and application teams to build AI enabled systems.
Required Skills
Strong understanding of modern LLM technologies, RAG patterns, and agent frameworks.
Solid experience with Python.
Experience with observability tools, traces, logs, metrics ( OpenTelemetry preferred)
Hands-on experience with FastAPI , Gunicorn containerised deployments
Familiarity with LangChain , LlamaIndex , or similar AI orchestration frameworks
Knowledge of vector databases (Pinecone, AWS Knowledgebase, Chroma, etc.)
Experience with cloud platforms (AWS).
Experience with CI/CD pipelines and MLOps practices.
Handle OpenAI Outages (fallbacks to smaller models)
This job is provided by Shine.com
Click on Apply to know more.