About the opportunity :
Raises the intelligence ceiling of the platform. Responsible for making agents measurably smarter over time through rigorous evaluation, advanced retrieval architecture, and a self-improving feedback loop.
What you will do :
▸ Own the end-to-end design, development, and continuous improvement of the Deep Research Agent
▸ Design and maintain the RAG pipeline: chunking strategy, embedding models, retrieval, and re-ranking
▸ Implement and optimise context compression to reduce overhead on long-horizon, multi-hop queries
▸ Build and operate the model evaluation harness: benchmark design, regression tracking, and A/B testing
▸ Lead the agent self-improvement loop: prompt proposal pipeline and benchmark-gated merge governance
▸ Track frontier model research and assess production applicability for the platform intelligence roadmap
▸ Advise on fine-tuning, prompt optimisation, and model selection strategy across model generations
The Skills you bring:
◦ Deep expertise in LLMs: transformer architecture, fine-tuning (LoRA/QLoRA), RLHF, and alignment techniques
◦ RAG system design: vector databases (Pinecone, Weaviate, pgvector), embedding models, hybrid search strategies
◦ ML experimentation tooling: MLflow, Weights & Biases, Vertex AI Experiments, or equivalent platforms
◦ Python ML stack: PyTorch or JAX, HuggingFace Transformers, LangChain or equivalent orchestration libraries
◦ Statistical evaluation methods: benchmark design, significance testing, and evaluation dataset curation
◦ Context compression and KV cache optimisation techniques, quantisation basics (GPTQ, AWQ)