AI/ML (0-3 years)

Meditab Software

Required skills

Website: meditab.com
Job details:

We are looking for candidates who have 0–3 years of experience in an AI/ML role.

Core Responsibilities:

Multimodal Solution Design: Architect end-to-end AI systems that integrate Speech-to-Text (STT), Text-to-Speech (TTS), and Computer Vision (OCR/Object Detection).
Agentic Workflow Development: Design and implement LLM-based agents capable of tool-use, reasoning, and multi-turn interaction for IVR and automated voice bots.
Vision & Document Intelligence: Develop high-accuracy OCR pipelines and vision models to extract structured data from complex, unstructured documents or real-world imagery.
Audio Pipeline Optimization: Build and tune low-latency audio processing pipelines specifically for real-time IVR and voice-bot responsiveness (VAD, Echo Cancellation, Diarization).
Model Fine-Tuning & RAG: Implement Retrieval-Augmented Generation (RAG) and fine-tune foundation models (LLMs, Whisper, Florence-2) for domain-specific accuracy.
Performance Engineering: Optimize model inference for production (quantization, pruning) to meet the strict latency requirements of live voice and video streams.

Technical Requirements:

1. Generative AI & NLP

LLM Frameworks: Expertise in LangChain, LlamaIndex, or CrewAI for building complex agentic workflows.
Prompt Engineering: Mastery of advanced prompting techniques (Chain-of-Thought, ReAct) and evaluation frameworks (Ragas, TruLens).
Vector Databases: Proficiency with Pinecone, Milvus, or Weaviate for efficient context retrieval.

2. Audio & Conversational AI (IVR/Bots)

Speech Tech: Experience with OpenAI Whisper, ElevenLabs, or Deepgram.
Telephony Integration: Knowledge of Twilio, Asterisk, or Vapi for deploying voice bots into IVR systems.
Signal Processing: Familiarity with WebRTC and real-time streaming protocols.

3. Computer Vision & OCR

OCR Engines: Advanced use of Tesseract, PaddleOCR, or cloud-native vision APIs (Azure Document Intelligence, AWS Textract).
Visual LLMs: Experience with multimodal models like GPT-4o, Claude 3.5 Sonnet, or LLaVA for "Chat-with-Image" capabilities.

4. Engineering & Infrastructure

Deep Learning: Advanced PyTorch or TensorFlow; experience with Hugging Face transformers and diffusers libraries.
Deployment: Containerization (Docker/K8s) and GPU optimization (NVIDIA Triton, vLLM, or TensorRT).
API Design: Building robust FastAPI/GRPC endpoints for high-throughput multimodal data.

Soft Skills:

Analytical Mindset: Ability to debug non-deterministic AI behaviors (hallucinations in LLMs or "drift" in vision models). Domain Agility: Quick
Domain Agility: Quick pivot between audio sampling rates, pixel normalization, and token limits. Communication: Translating complex "black-box" model behaviors into actionable business insights for stakeholders.
Communication: Translating complex "black-box" model behaviors into actionable business insights for stakeholders.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.