Senior Data Engineer – Databricks & Modern Data
Platforms
Experience: 6–8 Years (4–6 years hands-on Data Engineering
experience using Databricks)
Role Summary
We are seeking a Senior Data Engineer with deep expertise in
Databricks-based data engineering to design, build, and optimize scalable data
platforms. The role focuses on building reliable batch and streaming pipelines,
analytics-ready data models, and modern lakehouse architectures. In addition to
strong data engineering fundamentals, the ideal candidate will have working
knowledge of emerging AI and analytics tools, enabling collaboration with data
science, AI, and advanced analytics teams.
Databricks & Lakehouse Data Engineering
- Design, build, and
optimize high-performance batch and streaming pipelines using Databricks
(Apache Spark, PySpark, Delta Lake).
- Implement medallion
architecture (Bronze, Silver, Gold layers) with clear data
contracts.
- Ensure reliability,
scalability, performance optimization, and cost-efficient
processing.
- Leverage Delta Lake
capabilities such as ACID transactions, schema evolution, and time
travel.
Advanced Data Modeling & Analytics Enablement
- Design curated,
analytics-ready datasets including dimensional and fact-based
models.
- Apply best practices for
slowly changing dimensions (SCDs), effective dating, and
aggregations.
- Build reusable
transformations and semantic layers for BI and analytics.
- Make informed decisions
on table materialization vs. views.
Customer & Event Data Engineering
- Engineer scalable
customer, identity, household, and event-based data models.
- Support Customer
360-style datasets with consistent metric definitions.
- Ensure alignment between
operational data models and reporting consumption.
Legacy Data Modernization & Migration
- Reverse engineer legacy
reporting logic and migrate workloads to Databricks.
- Support data
reconciliation, validation, and parallel run strategies.
- Document and communicate
metric changes during modernization.
Data Engineering Best Practices
- Treat data pipelines and
SQL as production-grade code with Git-based version control.
- Implement CI/CD
pipelines, peer reviews, data quality checks, and monitoring.
- Maintain clear
documentation, lineage awareness, and ownership definitions.
Collaboration & Stakeholder Engagement
- Collaborate with
Analytics, BI, Product, Marketing, and Data Science teams.
- Translate business
requirements into scalable technical solutions.
- Clearly explain data
logic and platform changes to non-technical stakeholders.
Required Qualifications
- 6–8 years of experience
in Data Engineering.
- 4–6 years of hands-on
experience with Databricks, Apache Spark, PySpark, and Delta Lake.
- Strong SQL expertise on
distributed data platforms.
- Solid understanding of
data modeling, lakehouse architecture, and analytics enablement.
- Experience with
orchestration tools such as Azure Data Factory, Airflow, or similar.
- Experience working with
cloud platforms (Azure, AWS, or GCP).
Good to Have – AI & Modern Data Tooling
- Working knowledge of
AI/ML fundamentals and data engineering support for ML workloads.
- Exposure to Databricks
MLflow, Feature Store, or Model Serving.
- Familiarity with GenAI /
LLM ecosystems such as OpenAI, AWS Bedrock, LangChain, Langgraph
- Basic understanding of
prompt engineering, embeddings, or AI-assisted analytics.