Software Engineer, Research Data Pipelines

Nucleus AI

Location: India
Job type: Full-time

Required skills

Python
backend
compliance

About the role

Nucleus AI

Website: withnucleus.ai
Job details:

High-quality models depend on high-quality data systems—especially when the work involves human feedback, evolving datasets, and fast-moving research loops.

We’re hiring a Software Engineer, Research Data Pipelines to build pipelines and tooling for human data, labeling, and research datasets that feed model development at Nucleus. This role is focused on the operational and technical systems that turn raw inputs, annotations, and research workflows into dependable datasets for training, evaluation, and iteration.

You’ll work closely with research, data, and operations teams to create robust pipelines for collecting, processing, validating, and managing critical research data. The work requires both strong engineering judgment and care for data quality, lineage, and usability.

In this role, you will

Build and maintain pipelines for human data, annotations, labeling workflows, and research datasets used in model development.
Design systems for ingestion, validation, transformation, versioning, and quality control across diverse data sources.
Develop tooling that supports data curation, task routing, schema evolution, auditing, and provenance tracking.
Improve reliability, observability, and reproducibility across research data workflows and pipeline operations.
Partner with researchers and data/operations teams to translate evolving workflow needs into scalable systems and internal tools.
Support datasets used for supervised fine-tuning, evaluation, preference data, and other human-in-the-loop research processes.
Help define best practices around dataset quality, privacy, compliance, and lifecycle management.

You may be a good fit if you

Have strong software engineering experience building data pipelines, backend systems, or workflow tooling in production.
Have worked on annotation systems, labeling platforms, research data pipelines, or other data-intensive internal products.
Are comfortable with Python and one or more additional backend or systems languages.
Care deeply about data correctness, traceability, and building systems that can evolve with research needs.
Can work effectively across engineering, research, and operational stakeholders with different priorities and time horizons.
Enjoy building the less visible but deeply important systems that make model development more reliable and scalable.

What makes Nucleus different

Nucleus sees research data infrastructure as core to progress, not peripheral to it. In this role, you’ll build systems that influence how datasets are created, trusted, and used across the model development lifecycle. Your work will help turn human insight and structured feedback into durable technical advantage.

If you’re motivated by building the pipelines behind better models, we’d love to meet you.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.