Website:
cc33.co.uk
Job details:
About the Role
We are building AI-powered Voice Bots and intelligent automation products for the contact centre industry. Our stack spans Speech-to-Text (STT), Text-to-Speech (TTS), and Large Language Models (LLMs), with a strong preference for self-hosted, open-source models deployed on our own infrastructure. We're looking for a Data Scientist / Research Scientist who can own the model layer end-to-end — from scouting and benchmarking open-source models to deploying and fine-tuning them on-premises.
What You'll Do
- Continuously evaluate and benchmark open-source STT, TTS, and LLM models to identify the best fit for production workloads.
- Recommend and spec out the GPU/hardware requirements for on-premise model deployments, balancing cost, latency, and throughput.
- Fine-tune and adapt foundation models (LLMs, STT, TTS) on domain-specific and multilingual datasets when off-the-shelf performance isn't sufficient.
- Build and deploy traditional ML and Deep Learning models for tasks where a lightweight, purpose-built model outperforms a general-purpose LLM — classification, intent detection, entity extraction, anomaly detection, etc.
- Collaborate closely with the engineering team to integrate models into production voice bot and automation pipelines.
- Stay current with the rapidly evolving open-source AI landscape and bring new techniques, architectures, and tools to the team.
Requirements
- Strong foundation in Machine Learning and Deep Learning — hands-on experience building, training, and evaluating models beyond just calling APIs.
- Experience working with open-source LLMs (Llama, Mistral, Qwen, Gemma, etc.) — loading, quantizing, benchmarking, and serving them.
- Familiarity with STT and TTS model ecosystems (Whisper, Coqui, VITS, Piper, etc.) and their performance trade-offs.
- Ability to recommend and justify hardware configurations (GPU type, VRAM, compute requirements) for on-premise model hosting.
- Experience fine-tuning models using techniques like LoRA, QLoRA, or full fine-tuning on custom datasets.
- Proficiency in Python and the core ML/DL toolkit — PyTorch, Hugging Face Transformers, scikit-learn, NumPy, pandas.
- Experience with model serving and inference optimization (vLLM, TGI, ONNX Runtime, TensorRT, or similar).
- Understanding of when to use a large model vs. a small, task-specific model and the ability to make that trade-off pragmatically.
- Strong analytical and problem-solving skills with the ability to read and implement ideas from research papers.
Bonus / Good-to-Have (Senior Consideration)
Candidates with the following skills will be considered for a senior position:
- Experience with Python web frameworks such as FastAPI or Flask for building model-serving APIs or backend services.
- Hands-on experience with agentic orchestration frameworks — LangChain, LangGraph, CrewAI, or similar.
- Prior experience hosting and managing LLMs on bare-metal or cloud GPU infrastructure in production (not just experimentation).
- Experience fine-tuning STT and TTS models specifically (e.g., fine-tuning Whisper on domain-specific audio, training custom TTS voices).
- Familiarity with MLOps practices — experiment tracking, model versioning, CI/CD for model pipelines.
Click on Apply to know more.