Website:
superreply.ai
Job details:
2-Week Freelance Contract - LLM Ops Engineer
We're an AI-powered sales automation platform running GPT-4.1 agents on Instagram, WhatsApp, and Facebook DMs for 200+ businesses. We need someone to come in, audit our current setup, and build a clean LLM Ops foundation.
---
What you'll set up:
→ Langfuse tracing instrumentation across our AWS Lambda agent pipeline - named spans per agent step, tied to our existing userId/sessionId convention
→ Offline eval framework using Langfuse Datasets - curated from historical conversation data, with an LLM-as-judge scoring pipeline
→ Model comparison harness - run the same dataset against GPT-4.1, GPT-4o-mini, and Claude Sonnet, produce a structured cost vs quality report
→ Prompt caching audit - identify why our cache hit rate is low and fix it
→ Fallback and rate-limiting layer in Lambda— model failover chain
---
You're a fit if:
✅ You've worked with Langfuse or LangSmith before
✅ You understand LLM eval design — not just running evals but knowing what makes a good dataset
✅ You're comfortable in Node.js / Python and AWS Lambda
✅ You can work independently, communicate async, and ship fast
✅ You are actively fulfilling these responsibilities in your current role
---
Drop a DM with:
Proof of outcome created in the past. The outcome must be for a live app with 10000+ users minimum.
Click on Apply to know more.