- Location
- Chennai, Tamil Nadu, India
- Job type
- Full-time
Required skills
- Python
- caching
- CI
- data engineer
- database
- end-to-end
- ETL
- Hadoop
- Hive
- SQL
- Yarn
About the role
Tata Consultancy Services
Website:
tcs.com
Job details:
Job Title :Pyspark Data Engineer
Experience Level:5 to 8
Location : Pan India
Job Requirements*
- Good work experience on Big Data Platforms like Hadoop, PySpark, Scala, Hive, Impala, SQL, Python
- Good Python, Pyspark, Big Data experience
- Spark UI/Optimization/debugging techniques
- Good python scripting skills
- Intermediate SQL exposure – Subquery, Joins, CTE’s
- Database technologies
Key Responsibilities*
- Design scalable PySpark-based test architectures for ETL/data pipelines, including modular frameworks for batch processing.
- Architect end-to-end data validation systems in Hadoop environment for lineage, schema evolution
- Lead system design for Hadoop/Hive test environments, including YARN resource management, dynamic partitioning.
- Exposure to Zephyr-Jira-ServiceNow integrated test management systems with experience on API-driven automation.
- Design CI/CD test pipelines for PySpark/Hadoop jobs, incorporating artifact management, parallel execution, and blue-green deployments.
- Create data quality system designs using PySpark integrated with Hive metadata services.
- Design testing platforms, test data generators
- Mentor juniors on PySpark testing basics, contribute to testing strategy discussions
- Spark session configurations for memory and core allocations for both local and cluster manager settings
- Data handling with distributed file systems like HDFS and writing back to hive tables
- Implementation of Partitioning, caching techniques in organizing code for transformation pipelines
- Performance tuning implementation like salting, minimizing shuffling
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.