Website:
neuracore.com
Job details:
About Us
At Neuracore, we're building the world's first robot learning cloud service (https://github.com/NeuracoreAI/neuracore).
Our platform eliminates the complexity of traditional robotics development by providing a complete end-to-end solution for data collection, model training, and deployment that works across different robot types and configurations.
Our multidisciplinary team is at the forefront of making robot learning accessible to organisations worldwide, from manufacturing and logistics to healthcare and research institutions. We're transforming how robotics teams develop, train, and deploy intelligent systems by providing cloud-native infrastructure that scales from small research projects to enterprise-wide robot fleets.
About the Role
We are seeking a Data Engineer to design and build the data infrastructure that powers our robot learning platform. You'll architect scalable pipelines for ingesting, processing, and serving massive volumes of multi-modal robotics data — from sensor streams and telemetry to video and point clouds. This role offers the opportunity to build the foundational data layer that enables training across diverse robot embodiments and accelerates AI development for the entire robotics industry.
Key Responsibilities
Design and build scalable data pipelines for ingesting, transforming, and storing high-volume multi-modal robotics data including sensor streams, video, and telemetry
Architect and maintain data lake and warehouse infrastructure optimised for large-scale ML training workloads
Build real-time and batch processing systems for robot data collection across distributed fleets
Develop data quality frameworks including validation, monitoring, and lineage tracking to ensure reliability across the platform
Optimise data storage and retrieval for performance and cost efficiency at petabyte scale
Collaborate with ML engineers to ensure training datasets are properly versioned, reproducible, and efficiently served to distributed training jobs
Required Skills
Bachelor's degree or higher in Computer Science, Data Engineering, Software Engineering, or related field
Strong experience with data pipeline orchestration tools such as Apache Airflow, Dagster, or Prefect
Proficiency in Python and SQL with experience processing large-scale datasets using frameworks like Spark, Dask, or Ray
Cloud platform experience with AWS, GCP, or Azure including services like S3, BigQuery, Redshift, or equivalent
Experience with data modelling, schema design, and storage formats optimised for analytical and ML workloads (Parquet, Arrow, Delta Lake)
Solid understanding of distributed systems and event-driven architectures
Preferred Skills
Experience with streaming data systems such as Kafka, Kinesis, or Pulsar
Familiarity with ML data tooling including feature stores, dataset versioning (DVC, LakeFS), and experiment tracking
Knowledge of time-series data and sensor data processing
Experience handling multi-modal data types such as video, point clouds, or IMU data
Exposure to robotics data formats, ROS bag files, or similar
Infrastructure-as-code experience with Terraform, Pulumi, or similar tools
Click on Apply to know more.