AI Engineer

Unico Connect

Location: Mumbai, Maharashtra, India
Job type: Full-time

Required skills

Python
AWS
API development
caching
EC2
FastAPI

About the role

Unico Connect

Website: unicoconnect.com
Job details:
AI Engineer

LLMs, Agents & AI Services

Mumbai (On-site) | Full-time | 2-4 years

About the role

Unico Connect is an AI-first technology partner that builds custom mobile, web, and AI products for clients across multiple geographies. AI is core to how we design, deliver, and scale software for our customers, and is increasingly the substance of what those customers ask us to build. We are hiring an AI Engineer who will help us build the AI capabilities, agentic solutions, and AI-enabled platforms that ship inside live customer products.

The mandatory requirement for this role is at least one AI feature personally shipped to production for real users, with operational ownership. The role suits someone who thinks quickly on solutioning, can take an ambiguous problem to a working POC in days, and has the discipline to carry that POC through to a production system with predictable economics. You will work alongside delivery pods, product managers, AI engineers, and customers across active engagements, with full ownership of the AI surface area of the work.

Responsibilities

Solutioning and POCs: Translate ambiguous customer problems into working POCs at speed. Pick the right model, framework, and architecture for the use case, and demonstrate value early before scaling investment.
LLM application development: Build AI features and services using LLM APIs from OpenAI, Anthropic, Google, and self-hosted open-weight models (Llama, Qwen, Mistral). Choose the right model per use case based on cost, latency, capability, and context window trade-offs.
Agentic system design: Design and implement agentic workflows using LangGraph, CrewAI, AutoGen, LlamaIndex Agents, or custom orchestration. Cover tool use, planning, memory, and multi-step reasoning patterns appropriate to the problem.
API and service development: Build production AI services and APIs using Python and FastAPI. Handle streaming responses, async processing, structured outputs, retries, and graceful degradation when models or tools fail.
Retrieval and tool integration: Implement RAG pipelines with vector databases (Pinecone, Weaviate, Qdrant, pgvector, Chroma), embeddings, chunking strategies, hybrid search, and reranking. Integrate external tools, internal APIs, and document sources through tool-calling and MCP-style patterns.
Cost analysis and unit economics: Model the per-request and per-user cost of every AI feature before it ships. Track token usage, prompt caching, batching, and model-routing strategies. Drive measurable improvements in unit economics.
Production hardening: Add observability and tracing (LangSmith, Langfuse, OpenTelemetry), guardrails, content safety checks, prompt injection defences, and fallback behaviour.
Prompt engineering and evaluation: Design, test, and iterate prompts with measured outcomes. Build evaluation harnesses for accuracy, hallucination, latency, and cost. Run benchmarks across models and prompt variants before locking in a design.

Requirements

AI feature shipped to production (mandatory). Must have personally built and shipped at least one AI feature that runs in production for real users, with operational ownership. POCs, internal demos, and one-off scripts do not qualify.
2 to 4 years of professional software or AI engineering experience, with at least one production AI feature owned end to end.
Strong Python proficiency and API development with FastAPI. Comfort with type hints, async, packaging, testing, streaming responses, and authentication. Production-grade Python, not notebook-only code.
Hands-on depth across the LLM and agent stack. Working experience with at least two of OpenAI, Anthropic Claude, Google Gemini, or self-hosted open-weight models (vLLM, Ollama, Together, Replicate). Working familiarity with at least one agent framework (LangGraph, CrewAI, AutoGen, LlamaIndex Agents) or hand-rolled equivalent. Working knowledge of RAG, embeddings, and vector databases (Pinecone, Weaviate, Qdrant, pgvector, Chroma).
Solutioning speed and POC velocity. Demonstrated ability to move from a fuzzy problem statement to a working prototype in days, not weeks. Strong instinct for what to build first, what to defer, and what to throw away.
Cost discipline for production AI. Ability to calculate, monitor, and optimise the cost of LLM APIs, tokens, embeddings, vector store usage, and infrastructure. Treats unit economics as a first-class engineering concern.
AWS familiarity. Working knowledge of EC2, S3, IAM, and at least one of Bedrock, SageMaker, or equivalent.
Comfortable in a fast-moving environment. Self-directed, comfortable with ambiguity, takes ownership without being asked, and ships under shifting priorities.
Strong written and spoken English communication. Able to explain trade-offs to non-AI engineers, designers, product managers, and clients in plain language.

Nice to have: fine-tuning or LoRA, QLoRA, PEFT exposure; MCP server authoring; eval framework experience (LangSmith, Promptfoo, Ragas, DeepEval); open-source AI contributions; multi-modal models (vision, audio).

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.