Generalist - English & Hindi

Weekday (YC W21)

Location: India
Job type: Part-time

About the role

Weekday (YC W21)

Website: weekday.works
Job details:
This role is for one of our clients

Compensation: $12.19 per hour

This role supports the development of high-quality, reliable conversational AI systems by improving how large language models (LLMs) respond to real-world user queries. The focus is on evaluating and enhancing general chat behavior to ensure responses are accurate, clear, and aligned with human expectations across a wide range of topics and use cases.

Requirements

What You'll Do

Evaluate AI-generated responses for accuracy, relevance, and effectiveness in answering user queries.
Perform fact-checking using trusted public sources and external tools.
Create high-quality evaluation data by identifying strengths, weaknesses, and factual inaccuracies in responses.
Assess reasoning quality, clarity, tone, and completeness of outputs.
Ensure responses align with expected conversational standards and system guidelines.
Apply consistent annotations using defined taxonomies, benchmarks, and evaluation frameworks.

Who You Are

Bachelor's degree in any discipline.
Native-level fluency in Hindi (ILR 5 / CEFR C2) and strong proficiency in English.
Hands-on experience using large language models (LLMs) and understanding their real-world applications.
Excellent writing skills with the ability to provide clear, structured, and nuanced feedback.
Strong attention to detail with the ability to identify subtle issues in content.
Comfortable working across diverse topics, domains, and requirements.
Background in fields requiring structured analytical thinking such as research, analytics, policy, linguistics, or engineering.
Strong college-level mathematics and reasoning skills.

Nice-to-Have Skills

Experience with RLHF (Reinforcement Learning from Human Feedback), model evaluation, or data annotation.
Background in content writing, editing, or quality review.
Experience comparing multiple outputs and making detailed qualitative judgments.
Familiarity with evaluation rubrics, benchmarks, or quality scoring frameworks.

What Success Looks Like

Ability to identify factual errors, reasoning gaps, and communication issues in AI responses.
Deliver consistent, high-quality evaluation outputs.
Provide feedback that directly improves model performance and user experience.
Contribute to building reliable and trustworthy AI systems used at scale.

Why Join This Opportunity

Work at the forefront of human-in-the-loop AI development.
Contribute to improving conversational AI systems used globally.
Flexible, remote contract work with the ability to manage your own schedule.
Competitive compensation aligned with expertise and contribution level.

Contract & Payment Terms

Engagement on an independent contractor basis.
Fully remote with flexible working hours.
Project duration may vary based on performance and business needs.
Work involves only publicly available information, with no access to confidential or proprietary data.
Payments processed weekly via Stripe or Wise based on completed work.
Candidates requiring H1-B or STEM OPT sponsorship are not eligible.

Core Skills

LLM Evaluation | Content Analysis | Bilingual Communication (English & Hindi) Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.