Live Connections
Website:
liveconnections.in
Job details:
AI Testing – LLM Evaluation Engineer
Experience: 4-10 yrs
CTC: Up to 18 LPA
Key Responsibilities
1) AI/LLM Evaluation & Test Design
· Define evaluation strategies (golden sets, adversarial suites, regressions), pass/fail gates, and SLOs for quality, safety, latency, and cost.
· Establish rubric-based human reviews (usefulness, faithfulness, safety, clarity) and calibrate annotators.
· Instrument LLM-as-judge where appropriate with calibration and spot checks.
2) RAG, Retrieval, & Grounding –
· Measure retrieval precision/recall, MRR/nDCG, and answer faithfulness to sources; detect hallucination and citation errors.
· Test chunking, prompt templates, filters, and policy chains; monitor stale/poisoned content.
3) Agentic & Tool-Use Scenarios
· Validate multi-step plans, tool selection, error recovery, retries, and idempotency for functions with side effects.
· Contract-test JSON schemas and structured outputs across services.
4) Team & Standards
· Adopt and improve test standards/methodology; share practices, train teams, participate in peer reviews, and pursue self-directed learning.
Required Qualifications
· 4-6 years in software coding with programming familiarity (e.g., Python/TypeScript) and experience with CI/CD and version control.
· Cloud basics (AWS/Azure/GCP) and microservices fundamentals.
· Degree/Diploma in CS/IT or equivalent.
Required (AI/ML Focus)
· Understanding of ML concepts and MLOps; experience with model validation and monitoring in production.
· Familiarity with evaluation/observability tools (any of): LangSmith, Weights & Biases, RAGAS, TruLens, Promptfoo, DeepEval, Guardrails/LlamaGuard, Presidio; plus OpenTelemetry-style LLM traces.
· Practical exposure to Azure OpenAI/Bedrock/Vertex and model gateways; quota & token accounting know-how.
Tooling & Automation (Preferred)
· Data evaluation pipelines for RAG (embedding validation, filtering, drift detection).
Traits
· Outcome-oriented, high standards; strong communication and collaboration; customer-focused; proficient in written and spoken English.
Telco Context (Nice-to-Have)
· Experience testing copilots/agents for BSS/OSS, NOC analytics, and enterprise care; ability to tie eval KPIs to CSAT, AHT, FCR, MTTR.
Click on Apply to know more.