Python + Pyspark Developer - (Immediate Joiner only)

Innovya Technologies

Location: India
Job type: Full-time

Required skills

Python
AWS
Apache
Apache Spark
cross-functional
data lake
data solutions
Databricks
JSON
Lambda
Root Cause Analysis

About the role

Innovya Technologies

Website: innovyatech.com
Job details:

Company Overview:

Innovya Technologies is a dynamic and growing software consulting firm that drives business automation with cutting-edge solutions. We help businesses quickly realize value from their technology and digital initiatives by delivering expert, context-driven product development.

Cultural Fit:

At Innovya, we thrive in a fast-paced, supportive environment where continuous learning, teamwork, and disciplined execution are key. If you're results-driven, growth-minded, and eager to make an impact, Innovya is the place for you!

About the Role:

We are looking for an experienced PySpark Developer with deep expertise data streaming development and deployment to join our dynamic team. A Python developer who will play a key role in maintaining, enhancing, and modernizing critical high volume and real time data pipelines. The ideal candidate will have a strong background in building scalable, fault-tolerant, and high-performance ELT applications, with a keen interest in AI enabled development.

You will work closely with cross-functional teams to design, analyse, build and deploy complex systems, providing technical direction and expertise to ensure the delivery of robust, efficient, and scalable solutions.

Key Responsibilities:

Design, develop, and optimize scalable data pipelines using PySpark
Process and analyze large datasets in distributed environments
Collaborate with data engineers, analysts, and stakeholders to deliver data solutions
Write efficient, reusable, and reliable PySpark code
Perform data cleansing, transformation, and validation
Optimize Spark jobs for performance and cost efficiency
Integrate data from various sources (databases, APIs, streaming platforms, etc.)
Troubleshoot and debug data pipeline issues
Maintain documentation for data workflows and processes

Required Skills:

4+ years of total experience in Python development, with at least 2 years in Apache Spark (PySpark).
Hands-on experience with Python and PySpark in production data engineering environments. 
Use Spark DataFrames for large-scale data transformation and processing. 
Experience with complex aggregations, window functions, conditional aggregations, and performance tuning in Spark. 
Good understanding of Spark execution concepts such as partitions, shuffles, join behavior, and optimization techniques. 
Experience with API integrations, including pagination, request handling, response transformation, and fault tolerance. 
Working knowledge of AWS, preferably with services such as S3, Lambda, Glue, EMR, Athena, Step Functions, or similar. 
Experience handling JSON and schema-based parsing in distributed data pipelines. 
Familiarity with SQL-based data extraction and transformation. 
Ability to write production-quality, maintainable, and testable code. 
Good analytical and problem-solving skills, with the ability to understand complex data flows and business rules. 
Experience identifying bugs, troubleshooting processing issues, performing root cause analysis, and implementing fixes in distributed systems. 
Ability to create and maintain unit tests to improve code quality and reduce regressions.
Preferred - 2+ years of DataBricks and Data Lake development experience

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.