Report

Data Engineer - Python/PySpark

Min Experience

6 years

Location

Hyderabad, Telangana, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Key Responsibilities

Design, develop, and optimize large-scale data pipelines using PySpark and Apache Spark.
Build scalable and robust ETL workflows leveraging AWS services such as EMR, S3, Lambda, and Glue.
Collaborate with data scientists, analysts, and other engineers to gather requirements and deliver clean, well-structured data solutions.
Integrate data from various sources, ensuring high data quality, consistency, and reliability.
Manage and schedule workflows using Apache Airflow.
Work on ML model deployment pipelines using tools like SageMaker and Anaconda.
Write efficient and optimized SQL queries for data processing and validation.
Develop and maintain technical documentation for data pipelines and architecture.
Participate in Agile ceremonies, sprint planning, and code reviews.
Troubleshoot and resolve issues in production environments with minimal supervision.

Required Skills And Qualifications

Bachelor's or Masters degree in Computer Science, Engineering, or a related field.

68 years of experience in data engineering with a strong focus on :

Python
PySpark
SQL
AWS (EMR, EC2, S3, Lambda, Glue)
Experience in developing and orchestrating pipelines using Apache Airflow.
Familiarity with SageMaker for ML deployment and Anaconda for environment management.
Proficiency in working with large datasets and optimizing Spark jobs.
Experience in building data lakes and data warehouses on AWS.
Strong understanding of data governance, data quality, and data lineage.
Excellent documentation and communication skills.
Comfortable working in a fast-paced Agile environment.
Experience with Kafka or other real-time streaming platforms.
Familiarity with DevOps practices and tools (e.g., Terraform, CloudFormation).
Exposure to NoSQL databases such as DynamoDB or MongoDB.
Knowledge of data security and compliance standards (GDPR, HIPAA).
Work with cutting-edge technologies in a collaborative and innovative environment.
Opportunity to influence large-scale data infrastructure.
Competitive salary, benefits, and professional development support.
Be part of a growing team solving real-world data challenges.

(ref:hirist.tech)

python

pyspark

sql

aws

apache airflow

sagemaker

anaconda

spark

data lake

data warehouse

data governance

data quality

data lineage

kafka

nosql

dynamodb

mongodb

gdpr

hipaa