MSR Technology Group
Website:
msrtechnologies.com
Job details:
Title: Data Platform Engineer with Databricks
Location: Remote
Duration: 6+ Months
What you'll do:
- Design and build the DKB entity schema on Delta Lake (entity tables governed by Unity Catalog; edge tables deferred to Phase 2 Graph Extension)
- Build ingestion connectors for Unity Catalog metadata, GitHub, Confluence, and dbt -- each producing a standard changeset with idempotent upserts
- Implement graph traversal queries using Databricks SQL recursive CTEs on Serverless SQL Warehouse
- Enrich entity metadata from Unity Catalog Lineage API (table-to-table, column-to-column runtime lineage as entity properties)
- Build the Graph Repository abstraction layer (Python interface over Databricks SQL)
- Set up dev/staging/prod environments via UC catalog isolation, Terraform (IaC) and Databricks Asset Bundles (deployment)
- Own data quality: Pydantic validation at ingestion boundary, dead letter queues, graph integrity tests, connector integration tests with golden datasets
Must-have skills:
- 4+ years with Databricks (Delta Lake, Unity Catalog, SQL Warehouse, Jobs Compute)
- Strong SQL including recursive CTEs, window functions, complex joins
- Python for data engineering (Pydantic, pytest, API clients)
- Experience building data ingestion pipelines from REST APIs (GitHub, Confluence, Jira style)
- Comfortable with IaC (Terraform) and CI/CD for Databricks (Databricks Asset Bundles)
- Understanding of graph data modeling (entity-relationship, property graphs) -- does not need to be a graph DB expert
Nice-to-have:
- Experience with Unity Catalog Lineage API or External Lineage
- Experience with dbt (manifest.json, lineage metadata)
- Familiarity with Mosaic AI Vector Search
- Prior work with knowledge graphs or metadata platforms
EoE
Click on Apply to know more.