Data Engineer - Lead

Iris Software Inc.

Location: Noida, Uttar Pradesh, India
Job type: Full-time

Required skills

Python
Airflow
AWS
Azure
Bash
communication skills
compliance
cross-functional
database
DevOps
ETL
GCP
Hadoop
Kafka
metadata management
Shell Script
Spark
SQL
TensorFlow
value proposition
BI tools

About the role

Iris Software Inc.

Website: irissoftware.com
Job details:
Why Join Iris?

Are you ready to do the best work of your career at one of India’s Top 25 Best Workplaces in IT industry? Do you want to grow in an award-winning culture that truly values your talent and ambitions?

Join Iris Software — one of the fastest-growing IT services companies — where you own and shape your success story.

About Us

At Iris Software, our vision is to be our client’s most trusted technology partner, and the first choice for the industry’s top professionals to realize their full potential.

With over 4,300 associates across India, U.S.A, and Canada, we help our enterprise clients thrive with technology-enabled transformation across financial services, healthcare, transportation & logistics, and professional services.

Our work covers complex, mission-critical applications with the latest technologies, such as high-value complex Application & Product Engineering, Data & Analytics, Cloud, DevOps, Data & MLOps, Quality Engineering, and Business Automation.

Working with Us

At Iris, every role is more than a job — it’s a launchpad for growth.

Our Employee Value Proposition, “Build Your Future. Own Your Journey.” reflects our belief that people thrive when they have ownership of their career and the right opportunities to shape it.

We foster a culture where your potential is valued, your voice matters, and your work creates real impact. With cutting-edge projects, personalized career development, continuous learning and mentorship, we support you to grow and become your best — both personally and professionally.

Curious what it’s like to work at Iris? Head to this video for an inside look at the people, the passion, and the possibilities. Watch it here.

Job Description

PRIMARY RESPONSIBILITIES

Design and implement robust, scalable data pipelines to support AI/ML model development and deployment
Clean, transform, and curate structured and unstructured data from diverse sources to ensure model-ready quality
Collaborate with data scientists, ML engineers, and business teams to ensure data readiness, usability, and alignment with AI objectives
Develop and maintain metadata management, data lineage, and data quality frameworks to support AI governance and compliance
Enable advanced feature engineering capabilities and implement real-time data streaming solutions for AI applications
Design and deploy data collection instruments and oversee data aggregation processes
Ensure data compliance, privacy, and ethical use standards across all AI workflows
Support enterprise-wide data initiatives including business glossary development, taxonomy creation, and the DARE program's automation goals

Knowledge/Skills

Programming & Technical Skills: Advanced proficiency in Python, SQL, and distributed computing frameworks (Spark, Hadoop)
Cloud Platforms: Hands-on experience with cloud data platforms (AWS, Azure, GCP) and their data services
Data Engineering Tools: Expertise with data orchestration tools (Airflow, Prefect), ETL/ELT frameworks, and data pipeline optimization
AI/ML Data Specialization: Experience with ML feature stores, data versioning, model-ready data design, and real-time streaming technologies
Data Management: Strong understanding of data governance, metadata management, data quality frameworks, and lineage tracking
Compliance & Ethics: Knowledge of data privacy regulations, compliance standards, and ethical AI data practices
Collaboration: Excellent communication skills and ability to work effectively with cross-functional teams including data scientists, ML engineers, and business stakeholders

Required

EDUCATION AND EXPERIENCE

Bachelor's degree in Computer Science, Data Engineering, Computer Engineering, or related technical field
3+ years of experience in data engineering with demonstrated experience in AI/ML environments
Proficiency in Python and SQL for data manipulation and pipeline development
Experience with cloud data platforms (AWS, Azure, or GCP)
Hands-on experience with data orchestration tools (e.g., Airflow, Prefect) and ETL frameworks

Preferred

Master's degree in Computer Science, Data Engineering, or related field
5+ years of experience in data engineering, preferably in AI/ML environments
Experience with real-time data processing (Kafka, Kinesis).
Exposure to LLMs and generative AI data preparation.
Knowledge of MLOps and integration with ML lifecycle tools.
Familiarity with BI tools and semantic layer design.

Must Have -

Python & advanced SQL
Spark / distributed processing
Cloud platforms (AWS)
Airflow / orchestration tools
Data pipeline automation
Data governance & quality

Nice to have -

Kafka/Kinesis streaming
Feature stores
LLM‑ready data pipelines
MLOps exposure

Mandatory Competencies

Cloud - AWS - AWS S3, S3 glacier, AWS EBS

Programming Language - Python - Python Scripting

DevOps/Configuration Mgmt - DevOps/Configuration Mgmt - Basic Bash/Shell script writing

Data Governance - Data Governance - Solidatus

Database - Database Programming - SQL

Big Data - Big Data - Pyspark

Cloud - AWS - Tensorflow on AWS, AWS Glue, AWS EMR, Amazon Data Pipeline, AWS Redshift

Beh - Communication and collaboration

Perks And Benefits For Irisians

Iris provides world-class benefits for a personalized employee experience. These benefits are designed to support financial, health and well-being needs of Irisians for a holistic professional and personal growth. Click here to view the benefits. Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.