Data Engineer

FetchJobs.co

Location: India
Job type: Full-time

Required skills

Python
AWS
Apache
Artificial Intelligence
caching
Celery
compliance
data analytics
data architecture
data ingestion
data models
data science
database
machine learning
NLP
PostgreSQL
Redis

About the role

Website: fetchjobs.co
Job details:
About The Company

Turing is a leading technology company dedicated to advancing artificial intelligence and machine learning solutions across various industries. Our mission is to develop innovative platforms that empower organizations to harness the full potential of data-driven insights. With a focus on cutting-edge research and development, Turing combines expertise in AI, data engineering, and software development to deliver scalable, secure, and impactful solutions. Our team comprises talented professionals committed to pushing the boundaries of technological innovation and creating products that make a meaningful difference in the world.

About The Role

The Turing Data Engineering team is seeking a highly experienced Senior Data Engineer to support our Tradecraft Evaluation Platform, a sophisticated system that leverages multi-modal expertise capture pipelines and Graph Retrieval-Augmented Generation (RAG) architectures. In this role, you will be responsible for building and maintaining complex data pipelines that ingest, process, and organize vast amounts of multi-format documents, including transcripts, memos, reports, and code. You will work closely with the IC6 Data Engineer to implement data architecture designs and operate the data infrastructure throughout the project lifecycle.

This position involves developing scalable ingestion pipelines, semantic chunking engines, and vector database indexing systems using Weaviate or similar technologies. You will also oversee the management of golden datasets, ensuring full lineage tracking and versioning, and implement data pool routing infrastructure to enforce strict data separation for evaluation, training, and holdout datasets. Additionally, you will be instrumental in integrating synthetic data augmentation pipelines, entity subgraph extraction, and cost-tracking systems, contributing to the overall robustness and efficiency of our platform.

Qualifications

The ideal candidate will possess a minimum of 7 to 10 years of professional experience in data engineering, with a proven track record of developing and operating large-scale production data pipelines. You should have expert-level proficiency in Python (3.11+), with extensive experience in asynchronous programming and data transformation logic. Hands-on experience with vector databases such as Weaviate, Pinecone, or Qdrant, including schema design, batch ingestion, and retrieval optimization, is essential.

Strong expertise in PostgreSQL, including schema design, indexing, triggers, and operational management, is required. Experience with document processing tools like Apache Tika and PyMuPDF for multi-format ingestion (PDF, DOCX, audio transcriptions) is also necessary. Familiarity with object storage solutions such as S3, caching systems like Redis, and task orchestration frameworks like Celery is expected. Candidates should have a solid understanding of data versioning, lineage tracking, and compliance infrastructure, especially in sensitive or regulated environments.

Additional preferred qualifications include experience with knowledge graph data models, entity resolution pipelines, and time-series databases like TimescaleDB for metrics and cost tracking. Certifications such as AWS Data Analytics Specialty and familiarity with NLP preprocessing, PII detection, and anonymization will be advantageous.

Responsibilities

Standard Responsibilities:

Design, build, and maintain reliable, scalable data ingestion pipelines for multiple file formats and data sources.
Operate and optimize relational, vector, and cache databases for performance and stability.
Implement data quality checks, validation rules, and monitoring systems to ensure pipeline health and data integrity.
Develop and maintain data transformation logic in Python, ensuring comprehensive test coverage and documentation.

Project-Specific Responsibilities

Develop batch extraction pipelines to ingest historical expertise artifacts across various formats, utilizing tools such as Apache Tika and PyMuPDF, storing data in S3 with metadata in PostgreSQL, and routing through chunking engines.
Create semantic chunking engines tailored to document types, generating embeddings for vector database indexing to support retrieval-augmented evaluation.
Build and operate Weaviate-based vector indexing pipelines, including schema configuration, batch embedding ingestion, and hybrid search (vector + keyword) retrieval systems.
Design and implement a golden dataset management system with versioning, full lineage tracking, and immutable snapshots to support ML evaluation and validation processes.
Develop data pool routing services that enforce strict physical separation between datasets used for evaluation, training, and holdout, utilizing separate storage buckets and database schemas with audit logging.
Implement synthetic data augmentation pipelines, leveraging LLMs to generate variants of validated human examples, and manage their review and validation workflows.
Build the Graph Context Retriever to extract entity subgraphs from knowledge graphs via read-only APIs, supporting contextual evaluation scenarios.
Implement cost-tracking pipelines using TimescaleDB to monitor LLM API usage, attribution, and per-evaluation metrics.

Benefits

At Turing, we offer a comprehensive benefits package designed to support our employees’ well-being and professional growth. This includes competitive compensation packages, health insurance plans, and retirement savings options. We promote a flexible work environment that encourages work-life balance, with options for remote work and flexible hours. Additionally, employees have access to continuous learning opportunities, professional development programs, and participation in cutting-edge projects that impact the future of AI and data science. We foster an inclusive culture that values diversity, collaboration, and innovation, ensuring our team members feel valued and empowered to excel in their roles.

Equal Opportunity

Turing is an equal opportunity employer committed to fostering an inclusive environment for all employees. We celebrate diversity and are dedicated to creating a workplace free from discrimination and harassment. All qualified applicants will receive consideration for Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.