Flag job

Report

Senior PySpark Developer

Min Experience

5 years

Location

Hyderabad

JobType

Contract - Hybrid

About the job

Info This job is sourced from a job board

About the role

We are looking for a highly skilled Senior PySpark Developer with strong experience in Apache Spark, Python, and Databricks. The ideal candidate will be responsible for designing, developing, and optimizing big data pipelines and working closely with data engineers, data scientists, and other stakeholders to deliver scalable data solutions. Key Responsibilities: Design and develop large-scale distributed data processing systems using PySpark. Build and optimize ETL pipelines in Databricks. Work with structured and unstructured data, ensuring data quality and consistency. Collaborate with cross-functional teams to gather requirements and deliver solutions. Optimize Spark jobs for performance and scalability. Implement data governance, security, and compliance best practices. Debug and resolve complex data processing and transformation issues. Participate in code reviews, design discussions, and provide mentorship to junior developers. Required Skills: 5–7 years of experience in Python and PySpark. Hands-on experience with Databricks platform (Delta Lake, Jobs, Workflows, Clusters). Expertise in Spark SQL, RDDs, DataFrames, streaming, and optimization techniques. Proficient in building ETL/ELT pipelines using Spark and Databricks. Strong understanding of data modeling, data warehousing, and big data architecture. Experience working with cloud platforms like Azure, AWS, or GCP (especially Azure Databricks if applicable). Solid knowledge of SQL, Delta Lake, and Parquet/Avro file formats. Familiarity with CI/CD, version control (e.g., Git), and Agile methodologies. Preferred Skills: Experience with MLflow, Airflow, or DBT. Knowledge of Scala and Java is a plus. Exposure to DevOps, containerization (Docker/Kubernetes), or Terraform. Experience working with BI tools like Power BI or Tableau.

Skills

pyspark
etl
databricks
spark sql
data modeling
data warehousing
big data
sql
delta lake
parquet
avro
ci/cd
git
agile
mlflow
airflow
dbt
scala
java
devops
docker
kubernetes
terraform
power bi
tableau