RAG & Agentic AI Evaluation Engineer (Freelancer)

Deccan AI Experts

Location: India
Job type: Part-time

Required skills

end-to-end

About the role

Website: deccanexperts.ai
Job details:

About Us:

Deccan AI experts is a pioneering company founded by IIT Bombay and IIM Ahmedabad alumni, with a strong founding team from IITs, NITs, and BITS. We specialize in delivering high-quality human-curated data, AI-first scaled operations services, and more. Based in SF and Hyderabad, we are a young, fast-moving team on a mission to build AI for Good, driving innovation and positive societal impact.

About the Role:

As a RAG & Agentic AI Evaluation Engineer, you will work on cutting-edge Retrieval-Augmented Generation (RAG) systems and Agentic AI workflows.

Your primary task is to evaluate agent behavior, annotate model reasoning, validate retrieval quality, and provide detailed feedback to improve AI decision-making and multi-step workflows. This role suits individuals with strong analytical thinking, experience with LLMs, and familiarity with multi-step agent systems or RAG pipelines.

Responsibilities:

Annotate model responses, reasoning steps, tool usage, and agent actions
Evaluate RAG output quality: relevance, accuracy, grounding, hallucinations
Review agent workflows and end-to-end task execution
Identify incorrect reasoning, missing retrievals, or flawed tool calls
Provide structured feedback to improve agent behavior and RAG performance
Validate retrieved documents, sources, and context relevance
Review multi-hop reasoning chains for correctness and completeness
Tag error types such as bias, hallucination, logical gaps, retrieval mismatch
Follow detailed annotation guidelines to ensure consistent evaluations
Document insights, issues, and improvement suggestions for AI research teams
Collaborate with model developers to refine prompts, retrieval logic, and agent strategies

Skills & Experience Required:

1–4 years of experience working on RAG, or other agentic models.
Understanding of LLMs, RAG pipelines, embeddings, and retrieval workflows
Ability to judge correctness of model outputs and identify subtle issues
Familiarity with agentic workflows (tool use, multi-step tasks, reasoning traces)
Experience in documentation, evaluation, or quality review processes
High attention to detail and ability to follow structured guidelines
Experience with vector databases, prompt engineering, or LLM tools
Knowledge of multi-hop reasoning, chain-of-thought, or tool invocation
Prior experience in annotation or AI evaluation projects

Why join us?

Competitive hourly pay: upto ₹1500 per hour.
Fully remote and flexible work schedule.
Opportunity to contribute to the advancement of AI technology.

NOTE: Pay will vary by project and typically is up to ₹1500 per hour. If you work an average of 3 hours every day, you could earn up to ₹90,000 per month once you clear our screening process.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.