- Location
- Bengaluru, Karnataka, India
- Job type
- Full-time
Required skills
- Looker
- Airflow
- AWS
- Azure
- batch data
- BigQuery
- cross-functional
- data ingestion
- data lake
- data modeling
- data models
- data pipeline
- end-to-end
- ETL
- GCP
- GCS
- GitHub
- Google Cloud
- Kafka
- machine learning
- metadata management
- Snowflake
- SQL
- test automation
About the role
Happiest Minds Technologies
Website:
happiestminds.com
Job details:
Job Title: Data Pipeline / Test Engineer ? AI-Enhanced Data Quality & Automation
Department: Quality Engineering / Engineering Excellence
Employment Type: Full-Time / Contract to Hire / Contractor
Role
Required Experience Range :
About The Role
We are looking for a Data Pipeline / Test Engineer with deep expertise in testing large-scale batch and streaming data processing systems. The role involves ensuring end-to-end data quality, reliability, and integrity across modern data pipelines on GCP, AWS, or Azure ecosystems.
You?ll design and implement Python-based test automation frameworks that validate ingestion, transformation, and aggregation of data in ETL/ELT pipelines, leveraging AI-assisted testing approaches (Prompt Engineering, MCP, Copilot/Cursor) to improve speed and accuracy of validation.
This role is ideal for someone who can bridge data engineering, automation, and AI, ensuring that streaming and batch data flows powering analytics, ML models, and business dashboards are tested comprehensively and intelligently.
Role Impact
Your work will directly ensure the accuracy and reliability of high-volume data pipelines that feed analytics, machine learning, and campaign optimization platforms. By building intelligent validation frameworks and synthetic data systems, you will play a pivotal role in enabling trusted, production-grade data across the organization?s AI and decision intelligence ecosystems.
Key Responsibilities
Data Pipeline & ETL Testing
- Design, implement, and maintain automated test suites for validating batch and streaming data ingestion pipelines.
- Test data ingestion from Google Cloud Storage (GCS), APIs, and real-time streaming sources (e.g., Pub/Sub, Kafka, Kinesis).
- Validate data transformations, schema evolution, and data aggregation logic in ETL/ELT pipelines using Python and SQL.
- Build monitoring and alerting frameworks to proactively detect data drift, schema changes, and anomalies in data flow.
Data Systems & Tooling
- Perform validation and performance testing on data stored in BigQuery, Snowflake, Looker, or equivalent data warehouses.
- Test integration between data lake, warehouse, and reporting layers, ensuring accurate metrics propagation.
- Work with Airflow DAGs to validate job dependencies, schedule integrity, and pipeline orchestration.
- Collaborate with Data Engineers to improve observability and testability of ETL/ELT processes.
Test Automation & AI-Augmented Frameworks
- Develop Python-based test automation frameworks for validating batch and streaming data pipelines.
- Use Prompt Engineering to accelerate test case generation, data transformation validation, and anomaly detection logic.
- Leverage Codex / GitHub Copilot / Cursor to implement high-quality, context-aware test scripts with defined guardrails and context rules.
- Utilize MCP (Model Context Protocol) for dynamic data validation workflows and contextual test management (MCP usage is mandatory).
- (Optional) Integrate OpenAI APIs or other LLMs for smart test analysis, log summarization, or auto-healing test design.
Test Data Modelling & Synthetic Data Generation
- Design and maintain test data models aligned with production data schemas and business metrics.
- Generate synthetic data using configurable rules, statistical patterns, and metric definitions derived from production datasets.
- Validate data distribution, quality metrics, and referential integrity for both synthetic and real datasets.
- Work with analytics teams to ensure data used in testing accurately represents production scale, shape, and edge cases.
Required Skills & Qualifications
- Bachelor?s or Master?s degree in Computer Science, Data Engineering, or related field.
- Strong hands-on experience in testing batch and streaming data pipelines.
- Expertise with BigQuery or equivalent (Redshift, Synapse, etc.).
- Experience with Snowflake, Looker, or similar analytics systems on AWS or Azure.
- Advanced SQL skills ? joins, window functions, CTEs, query optimization, and complex validation queries.
- Solid knowledge of Python for data validation and test automation.
- Experience working with Airflow DAGs and data orchestration frameworks.
- Proven experience in data modeling, synthetic data generation, and defining data validation rulesets.
AI & Automation-Specific Expertise (Mandatory / Optional / Nice to Have)
Skill Area
Requirement Level
Details
Prompt Engineering
Mandatory
Strong ability to design prompts for data validation, test generation, and automation assistance.
Codex / GitHub Copilot / Cursor
Mandatory
Skilled in AI-assisted scripting with accurate context, guardrails, and coding rules.
MCP Implementation & Usage
Mandatory
Hands-on experience configuring and leveraging MCP for intelligent test data and metadata management.
OpenAI APIs
Optional
Experience in integrating APIs for test data synthesis, log analysis, or anomaly detection.
LLM Configuration
Optional
Exposure to fine-tuning or parameter-level adjustments to enhance model-assisted validation.
LLM Fundamentals
Nice to Have
Understanding of prompt context length, tokenization, and reasoning limits in LLMs.
AI Agents for Data Validation
Nice to Have
Experience in using or developing AI Agents for pipeline health checks, dynamic test selection, or automated data integrity scoring.
Communication & Soft Skills
- Excellent communication and cross-functional collaboration skills, especially with Data Engineering, ML, and Analytics teams.
- Strong ability to translate data validation results into actionable insights for technical and business stakeholders.
- Demonstrated problem-solving and analytical thinking in complex data environments.
- Mentorship experience and the ability to foster a culture of quality-first data engineering.
- Self-driven, adaptable, and comfortable with fast-paced, evolving AI and data ecosystems.
Python ETL, SQL, AI Prompt & Response
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.