Researcher - AI

The Caliper Lab

Location: India
Job type: Part-time

Required skills

data science
founder
statistics

About the role

Website:
Job details:

What We're Building

Enterprise AI products make capability claims worth hundreds of millions of dollars in deal value and procurement decisions. Nobody independently verifies them.

The Caliper Lab is building the infrastructure that changes that — structured, empirical assessment of whether AI product capability claims hold up against frontier model baselines, and how durable those claims are as the frontier improves.

The output is institutional-grade research that PE investors, procurement teams, and serious practitioners rely on to make decisions. This is early-stage, which means you would be building the research function, not slotting into one.

The Problem Is Genuinely Hard and Interesting

Evaluating AI capability is not advisory judgment. It requires task design, verified ground truth construction, controlled baseline comparisons, and scoring methodology that holds up to scrutiny from both investment professionals and academics. The academic foundations exist. The independent institutional home for applying them to commercial products does not yet. We are building it.

What You Will Work On

Design and execute benchmark evaluations of AI products against frontier model baselines
Build and maintain verified ground truth datasets across legal, financial, and knowledge worker AI categories
Synthesise signal from practitioner panels, public sources, and live benchmarks into published research reports
Contribute to methodology development — how the Lab measures what it measures, and how that evolves as the field develops
Work in the most cutting edge field of AI evaluations with genuinely some of the best research talent

This is not a literature review role. You will produce findings that go in front of PE investors and enterprise buyers.

Who We Are Looking For

2-4 years producing structured research on technology markets or AI systems
Could come from a research role at IDC, Gartner, Forrester, or similar; a research or analytics function inside a consulting firm; an applied ML or data science role where you produced findings, not just models; or an independent research track with published output
Quantitative background — statistics, mathematics, economics, engineering, or similar

What we need to see:

You read AI evaluation methodology papers and have opinions about what they get right and wrong
You understand what LLMs actually do well and poorly on real tasks — from working with them, not reading about it
You can write a clear finding from a messy dataset and defend it to a sceptical audience
You have shipped work under time pressure for a real stakeholder

The setup:

3 month contract to start with a monthly compensation
Can convert to a full-time founding team role with a compensation bump-up (and/ or equity) for the right person
Remote - work directly with the founder (Former Bain Principal, with strong Private Equity & tech advisory background) and senior methodology advisors (the most cutting edge academics in the world in the field of LLM evaluations)

To apply, please send your resume and a short write-up (300-400 words) on a specific AI product you have used for real work. What did it actually do well, what did it fail at, and how would you design a test to verify whether that failure is systematic or incidental. Please send all applications to dhruv@thecaliperlab.com

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.