Cloudera Data Engineer

McLaren Strategic Solutions (MSS)

full-time

Required skills

Python
Agile
AWS
Azure
clustering
CMS
compliance
Databricks
DevOps
end-to-end
ETL
GCP
integration testing
Jenkins
multi-tenant
SQL
Unity

About the role

McLaren Strategic Solutions (MSS)

Website: mclarensv.com
Job details:
About Us

Next Generation of Technology Consulting

Our approach is built on delivering value by combining our powerful ecosystem of platforms with capital efficient execution.

We bring together deep domain expertise and our strength in technology to help the world’s leading businesses build their digital core, optimize operations, accelerate revenue growth and deliver tangible outcomes at speed and scale.

Job Description

Key Responsibilities

Design, build, and optimize end-to-end ETL/ELT pipelines in Databricks using Delta Lake, Delta Live Tables (DLT), Auto Loader, PySpark, and Spark SQL for high-volume, multi-format partner ingestion.
Implement Medallion (zoned) architecture – Raw (bronze), Standardized (silver) with advanced validation, quarantine/reject logic, schema enforcement, and Curated (gold) consumer-ready datasets optimized for downstream COB/PI analytics.
Leverage Unity Catalog for data governance, access control, lineage, and secure multi-tenant data management.
Develop incremental processing, change data capture (CDC), backfill strategies, late-arriving data handling, and partitioning/optimization techniques (Z-Ordering, Liquid Clustering, Auto-Optimize) to eliminate performance bottlenecks.
Build robust data quality frameworks using Delta constraints, expectations, and monitoring to ensure clean, reliable data for downstream consumption.
Create production-grade Databricks Workflows, Jobs, and orchestration for reliable batch and near-real-time processing using Spark Structured Streaming.
Perform data profiling, mapping, reconciliation, and performance tuning of large-scale Spark jobs on Databricks clusters.
Collaborate with Senior Data Architect and Data Modeller to translate target-state lakehouse design into implementable, testable increments.
Deliver shippable, production-ready increments in Agile sprints within the implementation window, including CI/CD integration, unit/integration testing, and operational runbooks.
Establish comprehensive observability using Databricks Lakehouse Monitoring, SQL Alerts, and dashboards for pipeline health and SLA compliance.

Requirements

Required Qualifications & Experience

8+ years of hands-on data engineering experience
5+ years building enterprise-scale solutions on Databricks (Unity Catalog, Delta Lake, Delta Live Tables)
Proven track record delivering Medallion/zonal lakehouse architectures in production
Strong experience with high-volume, regulated data workloads (claims, financial, or healthcare data highly preferred)

Technical Skills – Databricks Expertise (Core)

Databricks Platform: Unity Catalog, Delta Lake, Delta Live Tables (DLT), Auto Loader, Workflows, Jobs, Repos, Lakehouse Monitoring
Core Technologies: PySpark, Spark SQL, Spark Structured Streaming, Delta constraints & expectations
Optimization & Performance: Liquid Clustering, Z-Ordering, Auto-Optimize, Dynamic Partition Overwrite, Photon engine
Governance & Quality: Unity Catalog ACLs, data lineage, schema evolution, Great Expectations (or equivalent)
Orchestration & CI/CD: Databricks Workflows, dbt on Databricks, Git integration, Azure DevOps / Jenkins
Languages: Expert Python (PySpark), SQL
Cloud: AWS/Azure/GCP (Databricks on any cloud)

Preferred Qualifications

Prior experience modernizing healthcare claims data lakes (COB, Payment Integrity, Medicaid/Medicare)
Exposure to partner ingestion patterns, multi-format data (EDI, flat files, APIs), and downstream analytical workloads
Familiarity with CMS/HIPAA data handling and compliance in Databricks environments

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.