Stellaspire
Website:
stellaspire.com
Job details:
Company Description
Our client is a growth investment firm that turns strong ideas into enduring businesses. They partner with ambitious founders and growing enterprises to scale faster, strengthen market presence, and create lasting value. With strategic capital and hands-on expertise, they help technology and service-driven companies reach their full potential.
ROLE SUMMARY
We are seeking an experienced AI Lead to own the design, delivery, and scaling of production AI systems powered by Generative AI, Agentic AI, and Retrieval-Augmented Generation. You will lead cross-functional engineering teams through agile sprints, drive enterprise AI adoption, and build intelligent systems that autonomously reason, retrieve, and act across complex workflows.
KEY RESPONSIBILITIES
• Lead end-to-end delivery of AI/ML projects using Agile/Scrum—sprint planning, backlog management, retrospectives, and on-time releases across multiple workstreams.
• Architect and deploy production GenAI applications: AI copilots, chatbots, content generators, summarisation engines, and code assistants using GPT, Claude, Gemini, LLaMA, and Mistral.
• Design and scale enterprise RAG pipelines—document ingestion, semantic chunking, embedding generation, vector search (Pinecone, Weaviate, ChromaDB), hybrid retrieval with BM25 + re-ranking, and advanced patterns like GraphRAG, corrective RAG, and agentic RAG.
• Build and orchestrate autonomous Agentic AI systems with multi-agent collaboration, ReAct reasoning loops, plan-and-execute architectures, persistent memory, tool-use/function-calling, and human-in-the-loop checkpoints using LangChain, LangGraph, CrewAI, or AutoGen.
• Design integration architectures connecting AI services to enterprise platforms (Salesforce, SAP, ServiceNow, Slack) via REST, GraphQL, gRPC, Kafka, MCP servers, and webhook-based event-driven triggers.
• Drive LLM fine-tuning using LoRA, QLoRA, PEFT, instruction tuning, and RLHF/DPO for domain-specific model adaptation; implement guardrails, hallucination detection, and output validation.
• Own MLOps infrastructure—CI/CT/CD pipelines, model monitoring, drift detection, A/B testing, automated retraining, and containerised serving on Docker/Kubernetes across AWS, Azure, or GCP.
• Establish agent observability and tracing (LangSmith, Arize Phoenix) and RAG evaluation frameworks (RAGAS) to ensure reliability, performance, and cost-efficiency at scale.
• Manage GPU/TPU compute, cost optimisation, auto-scaling, and multi-region high-availability deployments for mission-critical AI services.
• Design and deploy time series forecasting models for demand planning, anomaly detection, predictive maintenance, and financial forecasting using statistical methods (ARIMA, Prophet), deep learning (LSTM, Temporal Fusion Transformers, N-BEATS), and foundation time series models.
• Build end-to-end ML pipelines for classical AI/ML use cases—classification, regression, clustering, recommendation systems, and computer vision—ensuring robust feature engineering, model selection, and hyperparameter optimisation at scale.
REQUIRED QUALIFICATIONS
• 8–12 years in AI/ML engineering with 3+ years in a technical lead or architect role.
• Bachelor’s/Master’s in Computer Science, ML, AI, Data Science, or a related quantitative field.
• Deep hands-on expertise in Python, TensorFlow, PyTorch, Hugging Face Transformers, and scikit-learn.
• Proven production experience with GenAI applications, LLM fine-tuning, RAG systems (vector DBs, embedding pipelines, hybrid search), and Agentic AI (multi-agent orchestration, tool use, memory).
• Strong experience integrating AI systems with enterprise platforms via APIs, event-driven architectures, and middleware.
• Proficient in cloud platforms (AWS SageMaker/Bedrock, Azure ML, GCP Vertex AI) and IaC (Terraform, CloudFormation).
• Track record of running agile sprints and delivering complex AI projects on schedule.
• Hands-on experience building time series forecasting and anomaly detection models using statistical and deep learning approaches (ARIMA, Prophet, LSTM, Transformer-based models).
PREFERRED QUALIFICATIONS
• Ph.D. in ML, NLP, Computer Vision, or Reinforcement Learning.
• Experience with knowledge graphs (Neo4j), GraphRAG, edge AI deployment, and model optimisation (quantisation, distillation).
• Contributions to AI research (publications, patents, open-source) and certifications in cloud AI/ML services.
• Familiarity with responsible AI—guardrails, bias detection, content moderation, and explainability (SHAP, LIME).
TECHNICAL STACK
GenAI & LLMs GPT, Claude, Gemini, LLaMA, Mistral, Cohere; LoRA, QLoRA, RLHF, PEFT
Agentic AI LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, LangSmith
RAG & Search Pinecone, Weaviate, Milvus, ChromaDB, Qdrant, pgvector, FAISS, Cohere Rerank
ML Frameworks TensorFlow, PyTorch, Keras, Hugging Face, scikit-learn, JAX, Prophet, statsmodels
Integrations REST, GraphQL, gRPC, Kafka, WebSockets, MCP Servers, OAuth 2.0
MLOps & Infra MLflow, Kubeflow, Airflow, Docker, Kubernetes, Terraform, GitHub Actions
Cloud AWS (SageMaker, Bedrock), Azure (Azure ML, OpenAI Service), GCP (Vertex AI)
WHAT WE OFFER
• Competitive salary with performance bonuses and equity; access to cutting-edge GPU infrastructure.
• Generous L&D budget for conferences (NeurIPS, ICML), certifications, and advanced training.
• Flexible hybrid work, comprehensive benefits, and the opportunity to shape company-wide AI strategy.
Click on Apply to know more.