Antal International
Website:
antal.com
Job details:
Summary role description:
Hiring a Machine Learning (ML) Engineer for one of the leaders in embedded manufacturing.
Company description:
Our client is a high-end tech company playing their part in the backbone of enterprise IT infrastructure. The company is US headquartered and has a global footprint. Their technology is integrated into millions of devices worldwide, including servers, desktops, and embedded systems. With decades of experience in low-level systems development, they play a critical role in shaping the foundational software that powers modern computing platforms.
Role details:
• Title / Designation: Machine Learning (ML) Engineer
• Location: Kolkata
• Experience: 7+ years
Role & responsibilities:
• Build, train, and fine-tune LLMs, and evaluate their performance using techniques like SFT, LoRA/QLoRA, and RLHF.
• Optimize models for local or edge environments by improving speed and efficiency through quantization, pruning, and distillation.
• Deploy models into production on-premises or at the edge using frameworks such as PyTorch, ONNX, TensorRT, vLLM, or llama.cpp, and integrate them into applications via APIs and internal services.
• Design and maintain scalable training and inference pipelines to ensure reproducibility and efficiency.
• Monitor model performance in production, including accuracy, drift, latency, and resource utilization, and continuously optimize outcomes.
• Ensure models meet security, privacy, and compliance requirements, especially in restricted or offline environments.
• Collaborate with software engineers, infrastructure teams, and domain experts to deliver end-to-end AI solutions.
• Document model architectures, training processes, and deployment workflows for clarity and future use.
Candidate requirements:
• Master’s or Ph.D. in Computer Science, Engineering, or a related field, or equivalent practical experience, along with 7+ years of experience in AI/ML, including at least 2 years working on LLMs, large-scale neural networks, RAG, or AI-driven automation.
• Strong hands-on experience with LLMs such as LLaMA, Mistral, Falcon, or similar open-weight models, along with proficiency in Python and frameworks like PyTorch or TensorFlow.
• Expertise in vector databases and retrieval systems (FAISS, Weaviate, Chroma, Pinecone, Milvus) and experience building RAG-based solutions.
• Experience developing and deploying models in local, on-premises, or resource constrained environments, with a solid understanding of model optimization techniques like quantization, batching, and memory optimization.
• Hands-on experience with multi-agent AI systems (LangGraph, CrewAI, AutoGen, OpenAI Assistants API) and building autonomous or AI-driven workflows.
• Strong experience in end-to-end model development, working with business stakeholders to define KPIs and delivering multi-modal (text and image) or ensemble models.
• Familiarity with Linux, Docker, and basic cloud or on-prem infrastructure concepts. • Experience with distributed training, multi-GPU systems, and handling large scale models (10B+ parameters or multi-billion token datasets) is a plus.
• Knowledge of inference optimization tools such as vLLM, TensorRT-LLM, and ONNX, along with exposure to MLOps tools for model versioning and monitoring.
• Background in working with security-sensitive or regulated environments (such as finance, healthcare, or government) is preferred.
Selection Process:
• Two technical rounds
• One HR round
Recruiter Details:
• Swetha.Suresh@antal.com
Click on Apply to know more.