Flag job

Report

Data Engineer - Python/PySpark

Min Experience

6 years

Location

Hyderabad, Telangana, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Key Responsibilities

  • Design, develop, and optimize large-scale data pipelines using PySpark and Apache Spark.
  • Build scalable and robust ETL workflows leveraging AWS services such as EMR, S3, Lambda, and Glue.
  • Collaborate with data scientists, analysts, and other engineers to gather requirements and deliver clean, well-structured data solutions.
  • Integrate data from various sources, ensuring high data quality, consistency, and reliability.
  • Manage and schedule workflows using Apache Airflow.
  • Work on ML model deployment pipelines using tools like SageMaker and Anaconda.
  • Write efficient and optimized SQL queries for data processing and validation.
  • Develop and maintain technical documentation for data pipelines and architecture.
  • Participate in Agile ceremonies, sprint planning, and code reviews.
  • Troubleshoot and resolve issues in production environments with minimal supervision.

Required Skills And Qualifications

  • Bachelor's or Masters degree in Computer Science, Engineering, or a related field.

68 years of experience in data engineering with a strong focus on :

  • Python
  • PySpark
  • SQL
  • AWS (EMR, EC2, S3, Lambda, Glue)
  • Experience in developing and orchestrating pipelines using Apache Airflow.
  • Familiarity with SageMaker for ML deployment and Anaconda for environment management.
  • Proficiency in working with large datasets and optimizing Spark jobs.
  • Experience in building data lakes and data warehouses on AWS.
  • Strong understanding of data governance, data quality, and data lineage.
  • Excellent documentation and communication skills.
  • Comfortable working in a fast-paced Agile environment.
  • Experience with Kafka or other real-time streaming platforms.
  • Familiarity with DevOps practices and tools (e.g., Terraform, CloudFormation).
  • Exposure to NoSQL databases such as DynamoDB or MongoDB.
  • Knowledge of data security and compliance standards (GDPR, HIPAA).
  • Work with cutting-edge technologies in a collaborative and innovative environment.
  • Opportunity to influence large-scale data infrastructure.
  • Competitive salary, benefits, and professional development support.
  • Be part of a growing team solving real-world data challenges.

(ref:hirist.tech)

Skills

python
pyspark
sql
aws
apache airflow
sagemaker
anaconda
spark
data lake
data warehouse
data governance
data quality
data lineage
kafka
nosql
dynamodb
mongodb
gdpr
hipaa