Solution Architect – Data Modeling (Databricks, Azure/GCP)
Experience Level: 12+ years
About the Role:
A highly skilled Solution Architect with expertise in data modeling and cloud-based data platforms (Databricks, Azure, GCP) to design and implement enterprise-grade data solutions. This role focuses on developing data models within a modern data lakehouse architecture, leveraging Databricks and GCP services. The ideal candidate has extensive experience leading large-scale data initiatives, a strong technical foundation, and a deep understanding of data modeling principles and cloud best practices.
Key Responsibilities:
Data Modeling and Architecture:
Design and implement logical and physical data models based on modern architecture principles (Data Mesh, Data Vault 2.0, Medallion), adapting Kimball methodologies as needed.
Develop reusable data models and data products, optimizing intermediary layers and physical views/tables within Databricks and BigQuery.
Identify and normalize data grain from source systems, data warehouses, and data lakes, ensuring data consistency and quality.
Cloud Data Platform Expertise (Databricks, Azure/GCP):
Architect scalable data solutions on Databricks, leveraging GCP services (Google Cloud Storage, Google Compute Engine) and Azure services as needed.
In-depth understanding of Databricks Delta Lake, Unity Catalog, and related features for data management, governance, and performance optimization.
Implement DataOps frameworks and CI/CD pipelines for automated data model deployment and management.
Data Governance and Quality:
Implement and enforce data governance policies, including data quality monitoring, metadata management, and data lineage tracking.
Establish Data Quality and Observability frameworks using appropriate tooling and techniques.
Ensure data security best practices, including access control, encryption, and data masking.
Business Collaboration and Stakeholder Management:
Work closely with business and product teams to gather requirements and translate them into technical specifications.
Communicate complex technical concepts to both technical and non-technical stakeholders.
Lead workshops and design sessions to gather requirements and validate data models.
Integration and Program Leadership:
Lead integration efforts between systems and data sources, including APIs and event-driven architectures.
Oversee large-scale data implementation programs, ensuring on-time and on-budget delivery.
Maintain comprehensive documentation, including data dictionaries, data flow diagrams, and architectural diagrams.
Qualifications:
Experience:
12+ years of experience in data projects (data lakes, data warehouses, data lakehouses).
Experience with Supply Chain, CPG, and Retail data systems.
Proven track record of leading large-scale data migration and implementation projects.
Technical Skills:
Expert-level proficiency in Databricks.
Strong understanding of Azure and/or GCP cloud platforms and data services (e.g., Azure Data Lake Storage Gen2, Azure Synapse Analytics, Google BigQuery, Google Cloud Storage).
Expertise in SQL, Python, and Spark (PySpark).
Deep understanding of DevOps, DataOps, and Agile methodologies.
Experience with data modeling tools (e.g., erwin Data Modeler, DataGrip).
Soft Skills:
Excellent communication, presentation, and stakeholder management skills.
Strong analytical and problem-solving abilities.
Ability to lead and mentor technical teams.
Preferred Qualifications:
Databricks certifications (e.g., Databricks Certified Professional Data Engineer).
Azure or GCP certifications (e.g., Azure Data Engineer Associate, Google Cloud Professional Data Engineer).
Experience with data governance tools (e.g., Collibra, Alation).
Familiarity with advanced planning systems like SAP, Oracle, or other enterprise systems.