Python, PySpark, ETL Developer

Infosys

full-time

Required skills

caching
cross-functional
data ingestion
data pipeline
data structures
ETL

About the role

Infosys

Website: infosys.com
Job details:
Technology->Analytics - Packages->Python - Big Data,Technology->Big Data - Data Processing->PySpark, ETL

Data Pipeline Development

Develop and maintain scalable batch ETL pipelines using Python and PySpark for data ingestion, transformation, and loading.
Implement reusable transformation logic, ensuring pipelines are modular, testable, and easy to maintain.
Optimize Spark jobs for performance (partitioning, caching, joins, shuffles) and cost efficiency. Data Quality & Reliability
Apply data validation checks, handle schema evolution, and ensure accuracy and completeness of processed datasets.
Troubleshoot pipeline failures, analyze logs, and implement robust error handling and retry mechanisms.
Monitor job runs and support operational stability through alerts, runbooks, and timely incident resolution. Collaboration & Delivery
Work with cross-functional teams to gather requirements, define data mappings, and deliver datasets aligned to business needs.
Participate in code reviews, follow engineering best practices, and contribute to continuous improvement of standards and tooling.
Document pipeline logic, dependencies, and operational procedures for smooth handovers and long-term maintainability.
Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field (or equivalent practical experience).
2–5 years of hands-on experience building data pipelines using Python and PySpark.
Strong understanding of ETL concepts, data transformations, and handling large-scale datasets.
Proficiency in writing clean, maintainable code and debugging production issues.
Working knowledge of data structures, algorithms, and software development best practices.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.