Website:
resourcetree.co.in
Job details:
Key Responsibilities● Lakehouse Implementation: Build and optimize Medallion architectures using Databricks and Fabric Data Engineering (Lakehouses and Notebooks).
● Governance & Cataloging: Implement data discovery, lineage, and classification using Microsoft Purview.
● Hybrid Orchestration: Design complex workflows using Azure Data Factory and Synapse Pipelines (Good to have).
● Data Transformation: Develop highly efficient ETL processes using PySpark and Spark SQL.
● High-Performance ETL: Architect and maintain robust ETL/ELT processes using Delta Live Tables (DLT) and Spark SQL, ensuring data is ready for downstream ML and BI workloads.
● Performance Engineering: Deep-dive into Spark UI and query plans to eliminate bottlenecks, optimize shuffles, and reduce cloud compute costs.
Technical Requirements● SQL: Expert-level complex joins, window functions, and query optimization.
● Python/PySpark: Advanced capability in writing modular, unit-tested PySpark code and tuning Spark UI/Shuffle partitions.
● Deep experience in Databricks (Unity Catalog, Delta Live Tables, Databricks Workflows/Lakeflow).
● Deep expertise with Azure cloud with Azure Databricks, Fabric, Synapse & ADF.
● Experience with Azure Purview for enterprise-grade data governance.
● Big Data: Proficiency with technologies like Spark, Delta Lake, and Structured Streaming.
● Strong understanding of Data Warehouse/Lake/Lakehouse architecture and concepts.
● Programming: Strong Python/PySpark skills for data engineering tasks.
Good to have :
● Certifications,
○ DP-600/DP-700 - Microsoft Certified: Fabric Analytics Engineer Associate / Microsoft Certified: Fabric Data Engineer Associate
○ DP-203 - Data Engineering on Microsoft Azure
○ Databricks Certified Data Engineer
● Exposure of Databricks AI agent and AgentBricks
Click on Apply to know more.