Flag job

Report

Data Engineer

Min Experience

10 years

Location

Abu Dhabi, Abu Dhabi Emirate, United Arab Emirates

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

A reputable and well-established Technology company is actively recruiting a Data Engineer for their team in Abu Dhabi.


***Please take the time to read the job description, you must meet all the criteria set out below for your application to be considered. We do check all applications and suitable candidates will be contacted within 5 working days. If you are not contacted by us within that time, please consider your application unsuccessful on this occasion.***


The main responsibilities will include but not limited to:

  • Prepare and manage the datasets that power the fine-tuning and AI workflows.
  • Build ingestion pipelines for structured/unstructured data using Python.
  • Clean, normalize, and prepare data formats suitable for LLM fine-tuning (e.g., JSONL, CSV).
  • Create high-quality, task-specific datasets for training and evaluation.
  • Apply versioning to datasets using DVC or LakeFS for reproducibility.
  • Generate embeddings using HuggingFace or Sentence Transformers.
  • Manage vector indexes (FAISS, Weaviate) and optimize retrieval workflows.
  • Tokenize and chunk long-form data for context window optimization.


To be successful you will need to meet the following:

  • 10+ years of experience in a Data Engineering role.
  • 2+ years of experience in an AI-adjacent data role.
  • Experience managing datasets and object storage (MinIO, NFS)
  • Proficiency in Python, pandas, and text processing tools.
  • Familiarity with tokenization libraries (HuggingFace Tokenizers, SentencePiece)
  • Understanding of LLM data constraints (context windows, formatting, prompt injection)
  • Applicants should be available for face-to-face interviews in the location mentioned above.


Hiring? If you need help filling a similar position in your company, please contact us on +971(0)4 433 4579 or click here.


***We check all applications and suitable candidates will be contacted within 5 working days. If you are not contacted by us within that time, please consider your application unsuccessful on this occasion.***

About the company

A reputable and well-established Technology company

Skills

python
pandas
text processing
tokenization
hugging face
sentence piece
llm
data engineering
data management
object storage