Infosys
Website:
infosys.com
Job details:
Technology->Analytics - Packages->Python - Big Data,Technology->Big Data - Data Processing->PySpark
- Design, develop, and maintain scalable batch/stream data pipelines using Python and PySpark in distributed environments.
- Implement efficient transformations, aggregations, and joins on large datasets while ensuring performance and cost optimization.
- Write optimized SQL for data extraction, validation, and reconciliation across multiple sources.
- Build reusable, testable modules and follow engineering best practices (code reviews, unit testing, documentation).
- Troubleshoot production issues, perform root-cause analysis, and implement long-term fixes and monitoring improvements.
- Collaborate with stakeholders to translate requirements into technical designs, delivery plans, and measurable outcomes.
- Ensure data quality through validation checks, anomaly detection patterns, and consistent schema management.
- Contribute to continuous improvement of development standards, performance benchmarks, and pipeline reliability.
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- 5–9 years of hands-on experience in software development and/or data engineering roles.
- Strong proficiency in Python with experience building production-grade applications or data workflows.
- Strong proficiency in PySpark, including DataFrame APIs, optimization techniques, and distributed processing concepts.
- Working knowledge of SQL for complex queries, data analysis, and validation.
- Experience delivering reliable solutions with attention to performance, scalability, and maintainability.
Click on Apply to know more.