Summary:
At Pearson we ‘add life to a lifetime of learning’ so everyone can realise the life they imagine. We do this by creating vibrant and enriching learning experiences designed for real-life impact. Pearson was founded in 1844 and has been built on our ability to grow with and adapt to a constantly evolving market. Our employees are dedicated to creating high-quality, digital-first, accessible and sustainable resources for lifelong learning.
About the job:
The Data Quality Engineer operates at IC20 level as an individual contributor within delivery teams, supporting the build and operation of data integration platforms. The role focuses on ensuring the reliability, accuracy and consistency of data flowing through the Technical Hub and downstream systems.
This role sits within a data integration function supporting an Azure-based Technical Hub, with data pipelines spanning ingestion, transformation and distribution layers, alongside event-driven integration with downstream platforms.
Working closely with Data Engineers, Integration Engineers and platform teams, the Data Quality Engineer is responsible for validating data transformations, implementing automated data quality checks, and supporting reconciliation processes to ensure data integrity across systems.
About You: You have experience working in data-focused QA or data engineering environments, with a strong focus on validating data pipelines and ensuring data quality at scale.
You are comfortable working hands-on with SQL and automated validation frameworks and can investigate data issues across complex pipelines. You understand how data flows across ingestion, transformation and distribution layers and can work closely with engineering teams to resolve defects.
You bring a structured, detail-oriented approach to testing and are focused on automation over manual validation wherever possible.
Key Responsibilities
- Design and implement data validation checks across ingestion, transformation and distribution pipelines
- Develop automated data quality tests to validate canonical data transformations
- Implement reconciliation processes between the Technical Hub and downstream platforms
- Validate event-driven data propagation and ensure completeness and correctness
- Monitor pipelines for data integrity issues, missing records and unexpected changes
- Investigate and support resolution of data quality issues with engineering teams
- Define and maintain data quality metrics and dashboards
- Contribute to testing strategy (unit, integration and end-to-end validation of data flows)
- Support data replay, recovery and backfill validation during incidents
- Embed data validation into CI/CD pipelines where possible
Key Skills & Experience
- Strong SQL skills for validation, reconciliation and analysis
- Experience building automated data validation rather than relying on manual data checks
- Hands-on experience with automated data testing approaches (e.g. pytest or similar frameworks)
- Experience validating ETL / ELT pipelines and transformations
- Experience in data QA, data quality engineering, or testing data pipelines
- Understanding of data quality dimensions (accuracy, completeness, consistency, timeliness)
- Experience working with event-driven or integration-based systems
- Strong analytical and problem-solving skills
Data Quality and Testing
- SQL-based validation and reconciliation
- Automated testing using pytest or equivalent frameworks
Data & Integration
- Azure Data Factory (ADF)
- SQL (T-SQL)
DevOps & Tooling
- GitHub
- Visual Studio Code
- Jira
- Basic CI/CD awareness
Desirable Skills & Experience
- Experience in data integration or platform-based environments
- Exposure to canonical data models and data standardisation
- Experience with data observability or monitoring tools
- Understanding of data lineage and traceability
- Experience working in Agile delivery environments
- Azure certification is beneficial
- Great Expectations
- Soda / Soda SQL
- dbt (tests)
- Azure Service Bus (topics, queues)
- Event-driven integration patterns
- Azure Monitor / Log Analytics / Application Insights
#LI-P1