Pyspark Developer

Tata Consultancy Services

Required skills

Website: tcs.com
Job details:

Job Title :Pyspark Data Engineer

Experience Level:5 to 8

Location : Pan India

Job Requirements*

Good work experience on Big Data Platforms like Hadoop, PySpark, Scala, Hive, Impala, SQL, Python
Good Python, Pyspark, Big Data experience
Spark UI/Optimization/debugging techniques
Good python scripting skills
Intermediate SQL exposure – Subquery, Joins, CTE’s
Database technologies

Key Responsibilities*

Design scalable PySpark-based test architectures for ETL/data pipelines, including modular frameworks for batch processing.
Architect end-to-end data validation systems in Hadoop environment for lineage, schema evolution
Lead system design for Hadoop/Hive test environments, including YARN resource management, dynamic partitioning.
Exposure to Zephyr-Jira-ServiceNow integrated test management systems with experience on API-driven automation.
Design CI/CD test pipelines for PySpark/Hadoop jobs, incorporating artifact management, parallel execution, and blue-green deployments.
Create data quality system designs using PySpark integrated with Hive metadata services.
Design testing platforms, test data generators
Mentor juniors on PySpark testing basics, contribute to testing strategy discussions
Spark session configurations for memory and core allocations for both local and cluster manager settings
Data handling with distributed file systems like HDFS and writing back to hive tables
Implementation of Partitioning, caching techniques in organizing code for transformation pipelines
Performance tuning implementation like salting, minimizing shuffling

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.