Data Engineering, Data Science & AI Engineer

XaasIO

Location: Greater Coimbatore Area
Job type: Full-time

Required skills

LangChain
Python
Airflow
AWS
Ansible
Apache
Apache Airflow
Apache Flink
Apache Kafka
Apache Spark
Artificial Intelligence
Azure
capacity planning
cloud infrastructure
clustering
communication skills
compliance
CUDA
data architecture
data modeling
data science
DevOps
Docker
Elasticsearch
ETL
Flink
forecasting
GCP
GitHub
HBase
Helm
Hive
hybrid cloud
infrastructure-as-code
Jenkins
Jupyter Notebook
Kafka
Kubeflow
Kubernetes
Linux
machine learning
metadata management
multi-tenant
MySQL
NumPy
Pandas
Parquet
PostgreSQL
Ray
regression
Spark
SQL
SSO
statistics
TensorFlow
Terraform
Pytorch

About the role

XaasIO

Website: xaasio.com
Job details:

Job Description: Data Engineering, Data Science & AI Engineer
Primary Location: Coimbatore, Tamil Nadu
Work Mode: On-site / Hybrid
Company: XaasIO Systems Private Limited
Role Type: Full-time
Experience: 2 - 10 years preferred
About the Role
XaasIO is looking for a Data Engineering, Data Science & AI Engineer to work on the XaasIO Private AI Factory platform.

The role involves building enterprise-grade data pipelines, lakehouse platforms, AI/ML workflows, model training pipelines, inference services, RAG pipelines, and AI application stacks for private cloud, sovereign cloud, enterprise, BFSI, government, healthcare, manufacturing, telecom, and research environments.

The candidate should have strong exposure to data engineering, data science, machine learning, MLOps, GenAI, vector databases, data lakehouse platforms, and distributed computing.

This role is ideal for engineers who can work across data platforms, AI platforms, ML pipelines, model deployment, analytics, and customer-facing solution delivery.

Key Responsibilities
The candidate will be responsible for:
Designing and building data engineering pipelines for batch, streaming, and near-real-time data processing.
Working on the XaasIO Private AI Factory platform for enterprise AI, GenAI, RAG, MLOps, and AI application deployment.
Designing and implementing data lakehouse architectures using open-source technologies.
Building pipelines for ingestion, transformation, validation, cataloging, governance, and serving of data.

Working with structured, semi-structured, and unstructured data sources including databases, logs, documents, files, images, APIs, and streaming data.

Designing and implementing AI/ML workflows for:
Data preparation
Feature engineering
Model training
Model evaluation
Model registry
Model deployment
Model monitoring
Model retraining

Building RAG pipelines using document ingestion, chunking, embeddings, vector databases, retrievers, rerankers, LLMs, and agentic workflows.

Developing and integrating AI applications using open-source models, APIs, and private inference platforms.
Supporting model serving and inference platforms for enterprise use cases.
Working with distributed computing platforms such as Apache Spark, Kubernetes, and GPU-enabled infrastructure.
Integrating data and AI pipelines with CI/CD, DevSecOps, observability, and governance workflows.

Supporting customer-facing workshops, requirement gathering, solution architecture discussions, PoCs, demos, and implementation planning.

Creating technical documentation, architecture diagrams, data flow diagrams, runbooks, SOPs, test cases, and operational handover documents.

Troubleshooting data pipeline failures, model performance issues, infrastructure bottlenecks, GPU utilization issues, and AI platform integration problems.

Required Skills
The candidate should have hands-on experience in:
Data engineering and ETL/ELT pipeline development
Python programming
SQL and database fundamentals
Data modeling and schema design
Batch and streaming data processing
Data lake or data lakehouse platforms
Machine learning workflow development
Data science lifecycle
Feature engineering and model evaluation
MLOps concepts
REST API integration
Linux fundamentals
Git-based development workflow
Docker and containerized application deployment
Kubernetes basics
Strong problem-solving and analytical skills
Data Engineering Skills
The candidate should have experience in one or more of the following:
Apache Spark
Apache Airflow
Apache Kafka
Apache Flink
Apache NiFi
dbt
Trino / Presto
Hive
HBase
PostgreSQL
MySQL / MariaDB
MongoDB
Object storage such as S3, MinIO, or CEPH RGW
Data validation and data quality frameworks
Metadata management and data cataloging
Data Lakehouse Exposure
The candidate should have exposure to one or more of the following:
Apache Iceberg
Delta Lake
Apache Hudi
Apache Parquet
Apache Arrow
Trino / Presto
Spark SQL
Hive Metastore
Nessie catalog
Data partitioning and compaction
Data lineage and governance
Object-storage-backed lakehouse architecture
Data Science and Machine Learning Skills
The candidate should have exposure to:
Python-based data science stack
Pandas, NumPy, Scikit-learn
Jupyter Notebook / JupyterLab
Model training and evaluation
Classification, regression, clustering, forecasting, and anomaly detection
Feature engineering
Model explainability
Model performance metrics
ML experiment tracking
Model registry and lifecycle management
Responsible AI and model governance concepts
AI, GenAI and RAG Skills
The candidate should have working knowledge of:
Large Language Models
Open-source LLMs
Prompt engineering
Embedding models
Vector databases
Retrieval-Augmented Generation
Document ingestion and chunking
Semantic search
Reranking
AI agents and workflow orchestration
Private AI deployment patterns
Model inference and serving
GPU-based AI workloads
Preferred AI Factory Platform Exposure
The candidate should have exposure to one or more of the following platforms and tools:
Kubeflow
MLflow
JupyterHub / JupyterLab
vLLM
KServe
Seldon
BentoML
LangChain
LangGraph
LlamaIndex
OpenWebUI
Milvus
Qdrant
Weaviate
ChromaDB
Feast Feature Store
Ray
Dask
NVIDIA GPU Operator
NVIDIA DCGM Exporter
Prometheus and Grafana for AI workload observability
Cloud, Infrastructure and Platform Exposure
The candidate should have experience or working knowledge in:
Kubernetes-based AI platforms
OpenStack-based private cloud environments
GPU infrastructure for AI workloads
CEPH / S3 object storage
Public cloud AI services from AWS, Azure, or GCP
Hybrid cloud data movement
Data security and access control
IAM, SSO, RBAC, and multi-tenant environments
Secrets management
Network and storage considerations for AI workloads
Backup, restore, and disaster recovery for data platforms
DevOps, MLOps and DevSecOps Exposure
The candidate should have exposure to:
CI/CD pipelines for data and AI applications
GitHub Actions, GitLab CI/CD, Jenkins, Argo CD, or Tekton
GitOps-based deployment
Container image building and scanning
SAST, SCA, and container security scanning
Data pipeline testing
Model validation gates
Infrastructure-as-Code
OpenTofu or Terraform
Ansible
Helm charts
Kubernetes manifests
Observability and alerting for data and AI workloads
Customer-Facing and Delivery Responsibilities
The candidate should be able to:
Participate in customer-facing technical discussions and solution workshops.
Understand customer data, analytics, AI, compliance, and infrastructure requirements.
Convert requirements into:
Solution design documents
Data architecture diagrams
AI workflow diagrams
PoC plans
Implementation plans
Test cases
Runbooks
Operational handover documents
Support Day-0 discovery workshops for data and AI use cases.
Support Day-1 implementation of data platforms, AI pipelines, and Private AI Factory components.
Help define Day-2 operations parameters for AI and data platforms, including:
Data pipeline monitoring
Model monitoring
Inference monitoring
SLA and SLO parameters
Backup and restore process
Access control and governance
Incident management process
Capacity planning
GPU utilization monitoring
Security and compliance checks

Present technical findings, PoC results, architecture options, risks, and recommendations to internal and customer stakeholders.

Good-to-Have Skills
The following skills will be an added advantage:
Experience with enterprise AI Factory or MLOps platforms
Experience building private GenAI platforms
Experience with LLM fine-tuning or LoRA/QLoRA
Experience with model quantization
Experience with GPU scheduling and optimization
Experience with multi-GPU or distributed training
Experience with NVIDIA CUDA ecosystem
Experience with PyTorch or TensorFlow
Experience with Hugging Face models and libraries
Experience with data governance and data catalog tools
Experience with Apache Atlas or DataHub
Experience with OpenMetadata
Experience with OpenSearch or Elasticsearch
Experience with BI and dashboarding tools
Experience with Superset, Metabase, or Grafana dashboards
Experience with workflow automation and AI agents
Experience with secure AI deployment in regulated environments
Active GitHub profile, open-source contributions, notebooks, model demos, blogs, or technical portfolio
Preferred Technical Stack
Programming: Python, SQL
Data Engineering: Spark, Airflow, Kafka, Flink, NiFi, dbt
Lakehouse: Apache Iceberg, Parquet, Trino, Hive Metastore, CEPH S3 / MinIO
Data Science: Pandas, NumPy, Scikit-learn, JupyterLab
ML / Deep Learning: PyTorch, TensorFlow, Hugging Face
MLOps: MLflow, Kubeflow, KServe, Seldon, BentoML
GenAI / RAG: LangChain, LangGraph, LlamaIndex, vLLM, OpenWebUI
Vector Databases: Milvus, Qdrant, Weaviate, ChromaDB
Feature Store: Feast
Cloud / Infra: Kubernetes, OpenStack, CEPH, S3, GPU infrastructure
DevOps: Git, Docker, Helm, Argo CD, GitHub Actions, GitLab CI/CD, Jenkins
IaC / Automation: Ansible, OpenTofu, Terraform
Observability: Prometheus, Grafana, OpenSearch, NVIDIA DCGM
Security: RBAC, IAM, SSO, secrets management, Trivy, OpenSCAP, DevSecOps gates
Required Soft Skills
The candidate should have:
Strong problem-solving and analytical thinking
Strong communication skills
Ability to explain data and AI concepts clearly
Ability to work with infrastructure, application, and business teams
Ability to document architecture and implementation decisions
Ability to participate in customer workshops and technical discussions
Ability to work independently and as part of a distributed team
Ownership mindset for delivery, quality, and customer success
Curiosity to learn new open-source AI and data platforms
Candidate Profile
We are looking for someone who:
Can build enterprise-grade data and AI platforms.
Understands both data engineering and AI/ML lifecycle.
Can work with open-source AI and data platforms.
Can design and implement RAG, MLOps, and lakehouse pipelines.
Can work on Kubernetes and private cloud infrastructure.
Can troubleshoot data, model, pipeline, and infrastructure issues.
Can support customer-facing workshops, PoCs, demos, and implementation projects.
Can convert business use cases into technical data and AI workflows.
Can contribute to XaasIO Private AI Factory as a scalable, secure, production-grade platform.
Education

Bachelor’s or Master’s degree in Computer Science, Data Science, Artificial Intelligence, Machine Learning, Information Technology, Engineering, Mathematics, Statistics, or equivalent practical experience.

Certifications in data engineering, machine learning, cloud, Kubernetes, Linux, NVIDIA, or AI platforms will be an added advantage.

Summary

This is a data and AI platform engineering role based primarily in Coimbatore for engineers who want to build the XaasIO Private AI Factory using open-source technologies.

The role is ideal for candidates who can work across data engineering, data science, GenAI, RAG, MLOps, Kubernetes, OpenStack, CEPH S3, GPU infrastructure, DevSecOps, and enterprise AI operations. Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.