Report

Senior PySpark Developer

Min Experience

5 years

Location

Hyderabad

JobType

Contract - Hybrid

About the job

Info This job is sourced from a job board

Overview

About the role

We are looking for a highly skilled Senior PySpark Developer with strong experience in Apache Spark, Python, and Databricks. The ideal candidate will be responsible for designing, developing, and optimizing big data pipelines and working closely with data engineers, data scientists, and other stakeholders to deliver scalable data solutions. Key Responsibilities: Design and develop large-scale distributed data processing systems using PySpark. Build and optimize ETL pipelines in Databricks. Work with structured and unstructured data, ensuring data quality and consistency. Collaborate with cross-functional teams to gather requirements and deliver solutions. Optimize Spark jobs for performance and scalability. Implement data governance, security, and compliance best practices. Debug and resolve complex data processing and transformation issues. Participate in code reviews, design discussions, and provide mentorship to junior developers. Required Skills: 5–7 years of experience in Python and PySpark. Hands-on experience with Databricks platform (Delta Lake, Jobs, Workflows, Clusters). Expertise in Spark SQL, RDDs, DataFrames, streaming, and optimization techniques. Proficient in building ETL/ELT pipelines using Spark and Databricks. Strong understanding of data modeling, data warehousing, and big data architecture. Experience working with cloud platforms like Azure, AWS, or GCP (especially Azure Databricks if applicable). Solid knowledge of SQL, Delta Lake, and Parquet/Avro file formats. Familiarity with CI/CD, version control (e.g., Git), and Agile methodologies. Preferred Skills: Experience with MLflow, Airflow, or DBT. Knowledge of Scala and Java is a plus. Exposure to DevOps, containerization (Docker/Kubernetes), or Terraform. Experience working with BI tools like Power BI or Tableau.

Skills

pyspark

etl

databricks

spark sql

data modeling

data warehousing

big data

sql

delta lake

parquet

avro

ci/cd

git

agile

mlflow

airflow

dbt

scala

java

devops

docker

kubernetes

terraform

power bi

tableau