XaasIO
Website:
xaasio.com
Job details:
- Job Description: Data Engineering, Data Science & AI Engineer
- Primary Location: Coimbatore, Tamil Nadu
- Work Mode: On-site / Hybrid
- Company: XaasIO Systems Private Limited
- Role Type: Full-time
- Experience: 2 - 10 years preferred
- About the Role
- XaasIO is looking for a Data Engineering, Data Science & AI Engineer to work on the XaasIO Private AI Factory platform.
The role involves building enterprise-grade data pipelines, lakehouse platforms, AI/ML workflows, model training pipelines, inference services, RAG pipelines, and AI application stacks for private cloud, sovereign cloud, enterprise, BFSI, government, healthcare, manufacturing, telecom, and research environments.
The candidate should have strong exposure to data engineering, data science, machine learning, MLOps, GenAI, vector databases, data lakehouse platforms, and distributed computing.
This role is ideal for engineers who can work across data platforms, AI platforms, ML pipelines, model deployment, analytics, and customer-facing solution delivery.
- Key Responsibilities
- The candidate will be responsible for:
- Designing and building data engineering pipelines for batch, streaming, and near-real-time data processing.
- Working on the XaasIO Private AI Factory platform for enterprise AI, GenAI, RAG, MLOps, and AI application deployment.
- Designing and implementing data lakehouse architectures using open-source technologies.
- Building pipelines for ingestion, transformation, validation, cataloging, governance, and serving of data.
Working with structured, semi-structured, and unstructured data sources including databases, logs, documents, files, images, APIs, and streaming data.
- Designing and implementing AI/ML workflows for:
- Data preparation
- Feature engineering
- Model training
- Model evaluation
- Model registry
- Model deployment
- Model monitoring
- Model retraining
Building RAG pipelines using document ingestion, chunking, embeddings, vector databases, retrievers, rerankers, LLMs, and agentic workflows.
- Developing and integrating AI applications using open-source models, APIs, and private inference platforms.
- Supporting model serving and inference platforms for enterprise use cases.
- Working with distributed computing platforms such as Apache Spark, Kubernetes, and GPU-enabled infrastructure.
- Integrating data and AI pipelines with CI/CD, DevSecOps, observability, and governance workflows.
Supporting customer-facing workshops, requirement gathering, solution architecture discussions, PoCs, demos, and implementation planning.
Creating technical documentation, architecture diagrams, data flow diagrams, runbooks, SOPs, test cases, and operational handover documents.
Troubleshooting data pipeline failures, model performance issues, infrastructure bottlenecks, GPU utilization issues, and AI platform integration problems.
- Required Skills
- The candidate should have hands-on experience in:
- Data engineering and ETL/ELT pipeline development
- Python programming
- SQL and database fundamentals
- Data modeling and schema design
- Batch and streaming data processing
- Data lake or data lakehouse platforms
- Machine learning workflow development
- Data science lifecycle
- Feature engineering and model evaluation
- MLOps concepts
- REST API integration
- Linux fundamentals
- Git-based development workflow
- Docker and containerized application deployment
- Kubernetes basics
- Strong problem-solving and analytical skills
- Data Engineering Skills
- The candidate should have experience in one or more of the following:
- Apache Spark
- Apache Airflow
- Apache Kafka
- Apache Flink
- Apache NiFi
- dbt
- Trino / Presto
- Hive
- HBase
- PostgreSQL
- MySQL / MariaDB
- MongoDB
- Object storage such as S3, MinIO, or CEPH RGW
- Data validation and data quality frameworks
- Metadata management and data cataloging
- Data Lakehouse Exposure
- The candidate should have exposure to one or more of the following:
- Apache Iceberg
- Delta Lake
- Apache Hudi
- Apache Parquet
- Apache Arrow
- Trino / Presto
- Spark SQL
- Hive Metastore
- Nessie catalog
- Data partitioning and compaction
- Data lineage and governance
- Object-storage-backed lakehouse architecture
- Data Science and Machine Learning Skills
- The candidate should have exposure to:
- Python-based data science stack
- Pandas, NumPy, Scikit-learn
- Jupyter Notebook / JupyterLab
- Model training and evaluation
- Classification, regression, clustering, forecasting, and anomaly detection
- Feature engineering
- Model explainability
- Model performance metrics
- ML experiment tracking
- Model registry and lifecycle management
- Responsible AI and model governance concepts
- AI, GenAI and RAG Skills
- The candidate should have working knowledge of:
- Large Language Models
- Open-source LLMs
- Prompt engineering
- Embedding models
- Vector databases
- Retrieval-Augmented Generation
- Document ingestion and chunking
- Semantic search
- Reranking
- AI agents and workflow orchestration
- Private AI deployment patterns
- Model inference and serving
- GPU-based AI workloads
- Preferred AI Factory Platform Exposure
- The candidate should have exposure to one or more of the following platforms and tools:
- Kubeflow
- MLflow
- JupyterHub / JupyterLab
- vLLM
- KServe
- Seldon
- BentoML
- LangChain
- LangGraph
- LlamaIndex
- OpenWebUI
- Milvus
- Qdrant
- Weaviate
- ChromaDB
- Feast Feature Store
- Ray
- Dask
- NVIDIA GPU Operator
- NVIDIA DCGM Exporter
- Prometheus and Grafana for AI workload observability
- Cloud, Infrastructure and Platform Exposure
- The candidate should have experience or working knowledge in:
- Kubernetes-based AI platforms
- OpenStack-based private cloud environments
- GPU infrastructure for AI workloads
- CEPH / S3 object storage
- Public cloud AI services from AWS, Azure, or GCP
- Hybrid cloud data movement
- Data security and access control
- IAM, SSO, RBAC, and multi-tenant environments
- Secrets management
- Network and storage considerations for AI workloads
- Backup, restore, and disaster recovery for data platforms
- DevOps, MLOps and DevSecOps Exposure
- The candidate should have exposure to:
- CI/CD pipelines for data and AI applications
- GitHub Actions, GitLab CI/CD, Jenkins, Argo CD, or Tekton
- GitOps-based deployment
- Container image building and scanning
- SAST, SCA, and container security scanning
- Data pipeline testing
- Model validation gates
- Infrastructure-as-Code
- OpenTofu or Terraform
- Ansible
- Helm charts
- Kubernetes manifests
- Observability and alerting for data and AI workloads
- Customer-Facing and Delivery Responsibilities
- The candidate should be able to:
- Participate in customer-facing technical discussions and solution workshops.
- Understand customer data, analytics, AI, compliance, and infrastructure requirements.
- Convert requirements into:
- Solution design documents
- Data architecture diagrams
- AI workflow diagrams
- PoC plans
- Implementation plans
- Test cases
- Runbooks
- Operational handover documents
- Support Day-0 discovery workshops for data and AI use cases.
- Support Day-1 implementation of data platforms, AI pipelines, and Private AI Factory components.
- Help define Day-2 operations parameters for AI and data platforms, including:
- Data pipeline monitoring
- Model monitoring
- Inference monitoring
- SLA and SLO parameters
- Backup and restore process
- Access control and governance
- Incident management process
- Capacity planning
- GPU utilization monitoring
- Security and compliance checks
Present technical findings, PoC results, architecture options, risks, and recommendations to internal and customer stakeholders.
- Good-to-Have Skills
- The following skills will be an added advantage:
- Experience with enterprise AI Factory or MLOps platforms
- Experience building private GenAI platforms
- Experience with LLM fine-tuning or LoRA/QLoRA
- Experience with model quantization
- Experience with GPU scheduling and optimization
- Experience with multi-GPU or distributed training
- Experience with NVIDIA CUDA ecosystem
- Experience with PyTorch or TensorFlow
- Experience with Hugging Face models and libraries
- Experience with data governance and data catalog tools
- Experience with Apache Atlas or DataHub
- Experience with OpenMetadata
- Experience with OpenSearch or Elasticsearch
- Experience with BI and dashboarding tools
- Experience with Superset, Metabase, or Grafana dashboards
- Experience with workflow automation and AI agents
- Experience with secure AI deployment in regulated environments
- Active GitHub profile, open-source contributions, notebooks, model demos, blogs, or technical portfolio
- Preferred Technical Stack
- Programming: Python, SQL
- Data Engineering: Spark, Airflow, Kafka, Flink, NiFi, dbt
- Lakehouse: Apache Iceberg, Parquet, Trino, Hive Metastore, CEPH S3 / MinIO
- Data Science: Pandas, NumPy, Scikit-learn, JupyterLab
- ML / Deep Learning: PyTorch, TensorFlow, Hugging Face
- MLOps: MLflow, Kubeflow, KServe, Seldon, BentoML
- GenAI / RAG: LangChain, LangGraph, LlamaIndex, vLLM, OpenWebUI
- Vector Databases: Milvus, Qdrant, Weaviate, ChromaDB
- Feature Store: Feast
- Cloud / Infra: Kubernetes, OpenStack, CEPH, S3, GPU infrastructure
- DevOps: Git, Docker, Helm, Argo CD, GitHub Actions, GitLab CI/CD, Jenkins
- IaC / Automation: Ansible, OpenTofu, Terraform
- Observability: Prometheus, Grafana, OpenSearch, NVIDIA DCGM
- Security: RBAC, IAM, SSO, secrets management, Trivy, OpenSCAP, DevSecOps gates
- Required Soft Skills
- The candidate should have:
- Strong problem-solving and analytical thinking
- Strong communication skills
- Ability to explain data and AI concepts clearly
- Ability to work with infrastructure, application, and business teams
- Ability to document architecture and implementation decisions
- Ability to participate in customer workshops and technical discussions
- Ability to work independently and as part of a distributed team
- Ownership mindset for delivery, quality, and customer success
- Curiosity to learn new open-source AI and data platforms
- Candidate Profile
- We are looking for someone who:
- Can build enterprise-grade data and AI platforms.
- Understands both data engineering and AI/ML lifecycle.
- Can work with open-source AI and data platforms.
- Can design and implement RAG, MLOps, and lakehouse pipelines.
- Can work on Kubernetes and private cloud infrastructure.
- Can troubleshoot data, model, pipeline, and infrastructure issues.
- Can support customer-facing workshops, PoCs, demos, and implementation projects.
- Can convert business use cases into technical data and AI workflows.
- Can contribute to XaasIO Private AI Factory as a scalable, secure, production-grade platform.
- Education
Bachelor’s or Master’s degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Information Technology, Engineering, Mathematics, Statistics, or equivalent practical experience.
Certifications in data engineering, machine learning, cloud, Kubernetes, Linux, NVIDIA, or AI platforms will be an added advantage.
Summary
This is a data and AI platform engineering role based primarily in Coimbatore for engineers who want to build the XaasIO Private AI Factory using open-source technologies.
The role is ideal for candidates who can work across data engineering, data science, GenAI, RAG, MLOps, Kubernetes, OpenStack, CEPH S3, GPU infrastructure, DevSecOps, and enterprise AI operations.
Click on Apply to know more.