Headsup Corporation
Website:
headsupcorporation.com
Job details:
Job Description: Senior Data Engineer – Data Ingestion & Platforms
Role Overview
We are seeking a seasoned Senior Data Engineer with a strong software engineering mindset to design,
build, and optimize our next-generation Ingest Factory and Data Processing Frameworks. In this role,
you will go beyond traditional ETL scripting to build scalable, metadata-driven pipelines and reusable
data frameworks.
The ideal candidate possesses deep expertise in Python, PySpark, and the Databricks Lakehouse
ecosystem (including LakeFlow and Delta Lake), combined with rigorous software engineering
discipline (SOLID, CI/CD, and infrastructure as code). You will work both independently and
collaboratively within an Agile environment to build production-grade software that ensures data quality,
governance, and seamless orchestration.
Key Responsibilities
Architecture & Pipeline Engineering
- Ingest Factory Design: Design, develop, and maintain robust data ingestion frameworks
leveraging Databricks LakeFlow, managed connectors, and declarative pipelines.
- Data Lakehouse Patterns: Implement repeatable ETL/ELT patterns within a Delta Lake
architecture, ensuring optimized storage, table design, and strict data lineage enforcement.
- Metadata-Driven Orchestration: Build parameterized notebooks and end-to-end orchestration
flows to automate ingestion across diverse source system patterns.
Software Craftsmanship & Automation
- Advanced Python Development: Write clean, modular, and maintainable Python code applying
SOLID and DRY principles. Move beyond basic PySpark scripting to contribute to and publish
reusable internal packages (e.g., PyPI).
- Framework Creation: Define reusable functions and framework-level abstractions to
dramatically improve development efficiency across the data team.
- Testing & Quality Assurance: Implement rigorous data quality checks, monitoring, and alerting
frameworks. Lead test practices including Unit, Integration, and End-to-End (E2E) testing.
Optimization & Troubleshooting
- Performance Tuning: Optimize complex distributed Spark workloads, Databricks compute
configurations, and SQL queries (efficient filtering, indexing, and joins) to reduce processing
costs.
- Advanced Troubleshooting: Deep dive into Spark UI and logs to diagnose and resolve
performance bottlenecks, data skew, and serialization issues.
DevOps & Collaboration
- CI/CD & IaC: Own the deployment lifecycle by building and maintaining GitHub Actions / GitLab
pipelines and provision infrastructure utilizing Terraform (IaC).
- Agile Delivery: Actively participate in SCRUM ceremonies, design solutions to specific user
stories, vet architectures with the team, and deliver retro demos prior to production deployment.
Technical Skillset & Qualifications
Must-Have Core Skills:
- Cloud Data Experience: 5+ years of production experience in cloud data engineering and building
enterprise-grade software.
- Databricks Ecosystem: Deep hands-on experience with Databricks Notebooks, Jobs
optimization, Delta Lake, Connectors, and LakeFlow (jobs, tasks, flows).
- Advanced Python & Spark: Mastery of Core Python, Data Structures & Algorithms (DSA), and
package management. Clear understanding of distributed workloads (Spark vs. single-node
processing).
- Software Engineering Disciplines: 3–5+ years of practicing SOLID coding principles, Git controls
(PR reviews, branching strategies), and modern IDE features (Cursor/VSCode).
- DevOps & Automation: Production experience with Terraform and CI/CD tools (GitHub Actions
or GitLab CI).
- Advanced SQL: Proficiency in writing and optimizing mid-to-complex queries, ensuring efficient
data processing and model design.
Soft Skills & Operational Traits
- Ability to work independently as a self-starter while being a highly collaborative team player.
- Strong business and data literacy—understanding not just how to move data, but the business
purpose behind it.
- Excellent communication skills for vetting solutions with peer engineers and presenting work
during retro demos.
Skills: data structure & algorithms,pyspark,python,data ingestion & platforms,data engineer,sql
Click on Apply to know more.