Meditab Software
Website:
meditab.com
Job details:
We are looking for candidates who have 0–3 years of experience in an AI/ML role.
Core Responsibilities:
- Multimodal Solution Design: Architect end-to-end AI systems that integrate Speech-to-Text (STT), Text-to-Speech (TTS), and Computer Vision (OCR/Object Detection).
- Agentic Workflow Development: Design and implement LLM-based agents capable of tool-use, reasoning, and multi-turn interaction for IVR and automated voice bots.
- Vision & Document Intelligence: Develop high-accuracy OCR pipelines and vision models to extract structured data from complex, unstructured documents or real-world imagery.
- Audio Pipeline Optimization: Build and tune low-latency audio processing pipelines specifically for real-time IVR and voice-bot responsiveness (VAD, Echo Cancellation, Diarization).
- Model Fine-Tuning & RAG: Implement Retrieval-Augmented Generation (RAG) and fine-tune foundation models (LLMs, Whisper, Florence-2) for domain-specific accuracy.
- Performance Engineering: Optimize model inference for production (quantization, pruning) to meet the strict latency requirements of live voice and video streams.
Technical Requirements:
1. Generative AI & NLP
- LLM Frameworks: Expertise in LangChain, LlamaIndex, or CrewAI for building complex agentic workflows.
- Prompt Engineering: Mastery of advanced prompting techniques (Chain-of-Thought, ReAct) and evaluation frameworks (Ragas, TruLens).
- Vector Databases: Proficiency with Pinecone, Milvus, or Weaviate for efficient context retrieval.
2. Audio & Conversational AI (IVR/Bots)
- Speech Tech: Experience with OpenAI Whisper, ElevenLabs, or Deepgram.
- Telephony Integration: Knowledge of Twilio, Asterisk, or Vapi for deploying voice bots into IVR systems.
- Signal Processing: Familiarity with WebRTC and real-time streaming protocols.
3. Computer Vision & OCR
- OCR Engines: Advanced use of Tesseract, PaddleOCR, or cloud-native vision APIs (Azure Document Intelligence, AWS Textract).
- Visual LLMs: Experience with multimodal models like GPT-4o, Claude 3.5 Sonnet, or LLaVA for "Chat-with-Image" capabilities.
4. Engineering & Infrastructure
- Deep Learning: Advanced PyTorch or TensorFlow; experience with Hugging Face transformers and diffusers libraries.
- Deployment: Containerization (Docker/K8s) and GPU optimization (NVIDIA Triton, vLLM, or TensorRT).
- API Design: Building robust FastAPI/GRPC endpoints for high-throughput multimodal data.
Soft Skills:
- Analytical Mindset: Ability to debug non-deterministic AI behaviors (hallucinations in LLMs or "drift" in vision models). Domain Agility: Quick
- Domain Agility: Quick pivot between audio sampling rates, pixel normalization, and token limits. Communication: Translating complex "black-box" model behaviors into actionable business insights for stakeholders.
- Communication: Translating complex "black-box" model behaviors into actionable business insights for stakeholders.
Click on Apply to know more.