Data Engineer

Neuracore

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Python
Airflow
AWS
Apache
Apache Airflow
Azure
BigQuery
data engineer
data lake
end-to-end
GCP
GitHub
infrastructure-as-code
Kafka
Parquet
Ray
Spark
SQL
Terraform

About the role

Website: neuracore.com
Job details:

About Us

At Neuracore, we're building the world's first robot learning cloud service (https://github.com/NeuracoreAI/neuracore).

Our platform eliminates the complexity of traditional robotics development by providing a complete end-to-end solution for data collection, model training, and deployment that works across different robot types and configurations.

Our multidisciplinary team is at the forefront of making robot learning accessible to organisations worldwide, from manufacturing and logistics to healthcare and research institutions. We're transforming how robotics teams develop, train, and deploy intelligent systems by providing cloud-native infrastructure that scales from small research projects to enterprise-wide robot fleets.

About the Role

We are seeking a Data Engineer to design and build the data infrastructure that powers our robot learning platform. You'll architect scalable pipelines for ingesting, processing, and serving massive volumes of multi-modal robotics data — from sensor streams and telemetry to video and point clouds. This role offers the opportunity to build the foundational data layer that enables training across diverse robot embodiments and accelerates AI development for the entire robotics industry.

Key Responsibilities

Design and build scalable data pipelines for ingesting, transforming, and storing high-volume multi-modal robotics data including sensor streams, video, and telemetry

Architect and maintain data lake and warehouse infrastructure optimised for large-scale ML training workloads

Build real-time and batch processing systems for robot data collection across distributed fleets

Develop data quality frameworks including validation, monitoring, and lineage tracking to ensure reliability across the platform

Optimise data storage and retrieval for performance and cost efficiency at petabyte scale

Collaborate with ML engineers to ensure training datasets are properly versioned, reproducible, and efficiently served to distributed training jobs

Required Skills

Bachelor's degree or higher in Computer Science, Data Engineering, Software Engineering, or related field

Strong experience with data pipeline orchestration tools such as Apache Airflow, Dagster, or Prefect

Proficiency in Python and SQL with experience processing large-scale datasets using frameworks like Spark, Dask, or Ray

Cloud platform experience with AWS, GCP, or Azure including services like S3, BigQuery, Redshift, or equivalent

Experience with data modelling, schema design, and storage formats optimised for analytical and ML workloads (Parquet, Arrow, Delta Lake)

Solid understanding of distributed systems and event-driven architectures

Preferred Skills

Experience with streaming data systems such as Kafka, Kinesis, or Pulsar

Familiarity with ML data tooling including feature stores, dataset versioning (DVC, LakeFS), and experiment tracking

Knowledge of time-series data and sensor data processing

Experience handling multi-modal data types such as video, point clouds, or IMU data

Exposure to robotics data formats, ROS bag files, or similar

Infrastructure-as-code experience with Terraform, Pulumi, or similar tools

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.