C5i
Website:
c5i.ai
Job details:
We are looking for a skilled Data Engineer with strong expertise in Python, PySpark and Scala to build and manage scalable data pipelines and support data processing across large datasets.
Key Responsibilities:
- Design, develop, and maintain scalable data pipelines using PySpark
- Work with Hadoop ecosystem for distributed data processing and storage
- Develop and optimize Python-based data workflows
- Schedule, monitor, and manage workflows using Airflow
- Collaborate with cross-functional teams to ensure data availability and reliability
Must-have Skills:
- Strong hands-on experience with PySpark
- Good knowledge of Hadoop ecosystem (HDFS, Hive, etc.)
- Proficiency in Python programming
- Experience with Apache Airflow for workflow orchestration
- Understanding of data processing, ETL concepts, and large-scale data systems
Click on Apply to know more.