Semantic Data Engineer
About the Opportunity :
The most strategically critical technical hire on the team. Transforms raw operational data into a trusted, agent-consumable knowledge layer — the difference between an agent that hallucinates metrics and one that answers with full lineage and trust.
What you will do :
▸ Author, maintain, and evolve the schema registry covering all core operational data tables and relationships
▸ Build and govern the domain glossary: KPI definitions, filter syntax, entity relationships, and calculation logic
▸ Own the NL-to-SQL pipeline: accuracy benchmarking, query validation, dry-run testing, and trust scoring
▸ Design and deploy the Semantic Metric Layer as an MCP server for consistent cross-agent metric resolution
▸ Manage cloud data warehouse integration, query optimisation, and data residency governance
▸ Own the BI tool MCP server and its integration with the agent orchestration and streaming layer
▸ Maintain data contracts between the platform and consuming agents — manage schema change processes end-to-end
The Skills you bring:
◦ Advanced SQL and cloud data warehouse expertise — BigQuery, Snowflake, Databricks, or Redshift
◦ Semantic layer and data modelling: dbt, LookML, Cube.dev, or equivalent metric-layer tooling
◦ MCP (Model Context Protocol): server implementation, tool and resource primitive design patterns
◦ NL-to-SQL techniques: text-to-SQL models, schema linking, query correction, and validation loop design
◦ Python data engineering: pandas, SQLAlchemy, and pipeline orchestration (Airflow, Prefect, or equivalent)
◦ BI tool integration (Apache Superset, Looker, Power BI) at the API and embedded level
◦ Telecom domain knowledge — network KPIs, operational data schemas, CDR/EDR structures (strongly desirable)