Weekday (YC W21)
Website:
weekday.works
Job details:
This role is for one of our clients
Compensation: $12.19
per hour
This role supports the development of high-quality, reliable conversational AI systems by improving how large language models (LLMs) respond to real-world user queries. The focus is on evaluating and enhancing general chat behavior to ensure responses are accurate, clear, and aligned with human expectations across a wide range of topics and use cases.
Requirements
What You'll Do
- Evaluate AI-generated responses for accuracy, relevance, and effectiveness in answering user queries.
- Perform fact-checking using trusted public sources and external tools.
- Create high-quality evaluation data by identifying strengths, weaknesses, and factual inaccuracies in responses.
- Assess reasoning quality, clarity, tone, and completeness of outputs.
- Ensure responses align with expected conversational standards and system guidelines.
- Apply consistent annotations using defined taxonomies, benchmarks, and evaluation frameworks.
Who You Are
- Bachelor's degree in any discipline.
- Native-level fluency in Hindi (ILR 5 / CEFR C2) and strong proficiency in English.
- Hands-on experience using large language models (LLMs) and understanding their real-world applications.
- Excellent writing skills with the ability to provide clear, structured, and nuanced feedback.
- Strong attention to detail with the ability to identify subtle issues in content.
- Comfortable working across diverse topics, domains, and requirements.
- Background in fields requiring structured analytical thinking such as research, analytics, policy, linguistics, or engineering.
- Strong college-level mathematics and reasoning skills.
Nice-to-Have Skills
- Experience with RLHF (Reinforcement Learning from Human Feedback), model evaluation, or data annotation.
- Background in content writing, editing, or quality review.
- Experience comparing multiple outputs and making detailed qualitative judgments.
- Familiarity with evaluation rubrics, benchmarks, or quality scoring frameworks.
What Success Looks Like
- Ability to identify factual errors, reasoning gaps, and communication issues in AI responses.
- Deliver consistent, high-quality evaluation outputs.
- Provide feedback that directly improves model performance and user experience.
- Contribute to building reliable and trustworthy AI systems used at scale.
Why Join This Opportunity
- Work at the forefront of human-in-the-loop AI development.
- Contribute to improving conversational AI systems used globally.
- Flexible, remote contract work with the ability to manage your own schedule.
- Competitive compensation aligned with expertise and contribution level.
Contract & Payment Terms
- Engagement on an independent contractor basis.
- Fully remote with flexible working hours.
- Project duration may vary based on performance and business needs.
- Work involves only publicly available information, with no access to confidential or proprietary data.
- Payments processed weekly via Stripe or Wise based on completed work.
- Candidates requiring H1-B or STEM OPT sponsorship are not eligible.
Core Skills
LLM Evaluation | Content Analysis | Bilingual Communication (English & Hindi)
Click on Apply to know more.