Innovya Technologies
Website:
innovyatech.com
Job details:
Company Overview:
Innovya Technologies is a dynamic and growing software consulting firm that drives business automation with cutting-edge solutions. We help businesses quickly realize value from their technology and digital initiatives by delivering expert, context-driven product development.
Cultural Fit:
At Innovya, we thrive in a fast-paced, supportive environment where continuous learning, teamwork, and disciplined execution are key. If you're results-driven, growth-minded, and eager to make an impact, Innovya is the place for you!
About the Role:
We are looking for an experienced PySpark Developer with deep expertise data streaming development and deployment to join our dynamic team. A Python developer who will play a key role in maintaining, enhancing, and modernizing critical high volume and real time data pipelines. The ideal candidate will have a strong background in building scalable, fault-tolerant, and high-performance ELT applications, with a keen interest in AI enabled development.
You will work closely with cross-functional teams to design, analyse, build and deploy complex systems, providing technical direction and expertise to ensure the delivery of robust, efficient, and scalable solutions.
Key Responsibilities:
- Design, develop, and optimize scalable data pipelines using PySpark
- Process and analyze large datasets in distributed environments
- Collaborate with data engineers, analysts, and stakeholders to deliver data solutions
- Write efficient, reusable, and reliable PySpark code
- Perform data cleansing, transformation, and validation
- Optimize Spark jobs for performance and cost efficiency
- Integrate data from various sources (databases, APIs, streaming platforms, etc.)
- Troubleshoot and debug data pipeline issues
- Maintain documentation for data workflows and processes
Required Skills:
- 4+ years of total experience in Python development, with at least 2 years in Apache Spark (PySpark).
- Hands-on experience with Python and PySpark in production data engineering environments.
- Use Spark DataFrames for large-scale data transformation and processing.
- Experience with complex aggregations, window functions, conditional aggregations, and performance tuning in Spark.
- Good understanding of Spark execution concepts such as partitions, shuffles, join behavior, and optimization techniques.
- Experience with API integrations, including pagination, request handling, response transformation, and fault tolerance.
- Working knowledge of AWS, preferably with services such as S3, Lambda, Glue, EMR, Athena, Step Functions, or similar.
- Experience handling JSON and schema-based parsing in distributed data pipelines.
- Familiarity with SQL-based data extraction and transformation.
- Ability to write production-quality, maintainable, and testable code.
- Good analytical and problem-solving skills, with the ability to understand complex data flows and business rules.
- Experience identifying bugs, troubleshooting processing issues, performing root cause analysis, and implementing fixes in distributed systems.
- Ability to create and maintain unit tests to improve code quality and reduce regressions.
- Preferred - 2+ years of DataBricks and Data Lake development experience
Click on Apply to know more.