Valethi Technologies
Website:
valethi.com
Job details:
Job Description: Senior Data Engineer – Data Ingestion & Platforms
Role Overview We are seeking a seasoned Senior Data Engineer with a strong software engineering mindset to design, build, and optimize our next-generation Ingest Factory and Data Processing Frameworks.
In this role, you will go beyond traditional ETL scripting to build scalable, metadata-driven pipelines and reusable data frameworks. The ideal candidate possesses deep expertise in Python, PySpark, and the Databricks Lakehouse ecosystem (including LakeFlow and Delta Lake), combined with rigorous software engineering discipline (SOLID, CI/CD, and infrastructure as code). You will work both independently and collaboratively within an Agile environment to build production-grade software that ensures data quality, governance, and seamless orchestration.
Key Responsibilities Architecture & Pipeline Engineering
• Ingest Factory Design: Design, develop, and maintain robust data ingestion frameworks leveraging Databricks LakeFlow, managed connectors, and declarative pipelines.
• Data Lakehouse Patterns: Implement repeatable ETL/ELT patterns within a Delta Lake architecture, ensuring optimized storage, table design, and strict data lineage enforcement.
• Metadata-Driven Orchestration: Build parameterized notebooks and end-to-end orchestration flows to automate ingestion across diverse source system patterns. Software Craftsmanship & Automation
• Advanced Python Development: Write clean, modular, and maintainable Python code applying SOLID and DRY principles. Move beyond basic PySpark scripting to contribute to and publish reusable internal packages (e.g., PyPI).
• Framework Creation: Define reusable functions and framework-level abstractions to dramatically improve development efficiency across the data team.
• Testing & Quality Assurance: Implement rigorous data quality checks, monitoring, and alerting frameworks. Lead test practices including Unit, Integration, and End-to-End (E2E) testing. Optimization & Troubleshooting
• Performance Tuning: Optimize complex distributed Spark workloads, Databricks compute configurations, and SQL queries (efficient filtering, indexing, and joins) to reduce processing costs.
• Advanced Troubleshooting: Deep dive into Spark UI and logs to diagnose and resolve performance bottlenecks, data skew, and serialization issues. DevOps & Collaboration
• CI/CD & IaC: Own the deployment lifecycle by building and maintaining GitHub Actions / GitLab pipelines and provision infrastructure utilizing Terraform (IaC).
• Agile Delivery: Actively participate in SCRUM ceremonies, design solutions to specific user stories, vet architectures with the team, and deliver retro demos prior to production deployment.
Technical Skillset & Qualifications
Must-Have Core Skills:
• Cloud Data Experience: 5+ years of production experience in cloud data engineering and building enterprise-grade software.
• Databricks Ecosystem: Deep hands-on experience with Databricks Notebooks, Jobs optimization, Delta Lake, Connectors, and LakeFlow (jobs, tasks, flows).
• Advanced Python & Spark: Mastery of Core Python, Data Structures & Algorithms (DSA), and package management. Clear understanding of distributed workloads (Spark vs. single-node processing).
• Software Engineering Disciplines: 3–5+ years of practicing SOLID coding principles, Git controls (PR reviews, branching strategies), and modern IDE features (Cursor/VSCode).
• DevOps & Automation: Production experience with Terraform and CI/CD tools (GitHub Actions or GitLab CI).
• Advanced SQL: Proficiency in writing and optimizing mid-to-complex queries, ensuring efficient data processing and model design.
Soft Skills & Operational Traits:
• Ability to work independently as a self-starter while being a highly collaborative team player.
• Strong business and data literacy—understanding not just how to move data, but the business purpose behind it.
• Excellent communication skills for vetting solutions with peer engineers and presenting work during retro demos.
Click on Apply to know more.