TEST MODULE LEAD - Python ETL

Happiest Minds Technologies

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Looker
Airflow
AWS
Azure
batch data
BigQuery
cross-functional
data ingestion
data lake
data modeling
data models
data pipeline
end-to-end
ETL
GCP
GCS
GitHub
Google Cloud
Kafka
machine learning
metadata management
Snowflake
SQL
test automation

About the role

Happiest Minds Technologies

Website: happiestminds.com
Job details:
Job Title: Data Pipeline / Test Engineer ? AI-Enhanced Data Quality & Automation

Department: Quality Engineering / Engineering Excellence

Employment Type: Full-Time / Contract to Hire / Contractor

Role

Required Experience Range :

About The Role

We are looking for a Data Pipeline / Test Engineer with deep expertise in testing large-scale batch and streaming data processing systems. The role involves ensuring end-to-end data quality, reliability, and integrity across modern data pipelines on GCP, AWS, or Azure ecosystems.

You?ll design and implement Python-based test automation frameworks that validate ingestion, transformation, and aggregation of data in ETL/ELT pipelines, leveraging AI-assisted testing approaches (Prompt Engineering, MCP, Copilot/Cursor) to improve speed and accuracy of validation.

This role is ideal for someone who can bridge data engineering, automation, and AI, ensuring that streaming and batch data flows powering analytics, ML models, and business dashboards are tested comprehensively and intelligently.

Role Impact

Your work will directly ensure the accuracy and reliability of high-volume data pipelines that feed analytics, machine learning, and campaign optimization platforms. By building intelligent validation frameworks and synthetic data systems, you will play a pivotal role in enabling trusted, production-grade data across the organization?s AI and decision intelligence ecosystems.

Key Responsibilities

Data Pipeline & ETL Testing

Design, implement, and maintain automated test suites for validating batch and streaming data ingestion pipelines.
Test data ingestion from Google Cloud Storage (GCS), APIs, and real-time streaming sources (e.g., Pub/Sub, Kafka, Kinesis).
Validate data transformations, schema evolution, and data aggregation logic in ETL/ELT pipelines using Python and SQL.
Build monitoring and alerting frameworks to proactively detect data drift, schema changes, and anomalies in data flow.

Data Systems & Tooling

Perform validation and performance testing on data stored in BigQuery, Snowflake, Looker, or equivalent data warehouses.
Test integration between data lake, warehouse, and reporting layers, ensuring accurate metrics propagation.
Work with Airflow DAGs to validate job dependencies, schedule integrity, and pipeline orchestration.
Collaborate with Data Engineers to improve observability and testability of ETL/ELT processes.

Test Automation & AI-Augmented Frameworks

Develop Python-based test automation frameworks for validating batch and streaming data pipelines.
Use Prompt Engineering to accelerate test case generation, data transformation validation, and anomaly detection logic.
Leverage Codex / GitHub Copilot / Cursor to implement high-quality, context-aware test scripts with defined guardrails and context rules.
Utilize MCP (Model Context Protocol) for dynamic data validation workflows and contextual test management (MCP usage is mandatory).
(Optional) Integrate OpenAI APIs or other LLMs for smart test analysis, log summarization, or auto-healing test design.

Test Data Modelling & Synthetic Data Generation

Design and maintain test data models aligned with production data schemas and business metrics.
Generate synthetic data using configurable rules, statistical patterns, and metric definitions derived from production datasets.
Validate data distribution, quality metrics, and referential integrity for both synthetic and real datasets.
Work with analytics teams to ensure data used in testing accurately represents production scale, shape, and edge cases.

Required Skills & Qualifications

Bachelor?s or Master?s degree in Computer Science, Data Engineering, or related field.
Strong hands-on experience in testing batch and streaming data pipelines.
Expertise with BigQuery or equivalent (Redshift, Synapse, etc.).
Experience with Snowflake, Looker, or similar analytics systems on AWS or Azure.
Advanced SQL skills ? joins, window functions, CTEs, query optimization, and complex validation queries.
Solid knowledge of Python for data validation and test automation.
Experience working with Airflow DAGs and data orchestration frameworks.
Proven experience in data modeling, synthetic data generation, and defining data validation rulesets.

AI & Automation-Specific Expertise (Mandatory / Optional / Nice to Have)

Skill Area

Requirement Level

Details

Prompt Engineering

Mandatory

Strong ability to design prompts for data validation, test generation, and automation assistance.

Codex / GitHub Copilot / Cursor

Mandatory

Skilled in AI-assisted scripting with accurate context, guardrails, and coding rules.

MCP Implementation & Usage

Mandatory

Hands-on experience configuring and leveraging MCP for intelligent test data and metadata management.

OpenAI APIs

Optional

Experience in integrating APIs for test data synthesis, log analysis, or anomaly detection.

LLM Configuration

Optional

Exposure to fine-tuning or parameter-level adjustments to enhance model-assisted validation.

LLM Fundamentals

Nice to Have

Understanding of prompt context length, tokenization, and reasoning limits in LLMs.

AI Agents for Data Validation

Nice to Have

Experience in using or developing AI Agents for pipeline health checks, dynamic test selection, or automated data integrity scoring.

Communication & Soft Skills

Excellent communication and cross-functional collaboration skills, especially with Data Engineering, ML, and Analytics teams.
Strong ability to translate data validation results into actionable insights for technical and business stakeholders.
Demonstrated problem-solving and analytical thinking in complex data environments.
Mentorship experience and the ability to foster a culture of quality-first data engineering.
Self-driven, adaptable, and comfortable with fast-paced, evolving AI and data ecosystems.

Python ETL, SQL, AI Prompt & Response Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.