Alternative Path
Website:
alternative-path.com
Job details:
We're building agentic AI systems that autonomously reason, plan, and act across complex workflows. You'll be a core contributor designing and shipping production-grade AI agents, integrating with LLM providers, orchestrating multi-step workflows, and building the infrastructure that makes agents reliable, observable, and scalable. This is a deeply technical role — you'll work at the intersection of LLM engineering, backend systems, and agent architecture.
Job Title - Senior AI & Agentic System Engineer
Location - Gurgaon, India (You may note that this role is fully remote)
Level – Senior (5+)
Team – AI Platform / Engineering
WHAT YOU'LL DO
• Design and build AI agents — autonomous systems that use tools, browse the web, call APIs, manage memory, and complete multi-step tasks with minimal human intervention
• Architect agentic workflows — orchestrate multi-agent pipelines (planner → executor → verifier patterns), handle tool-use loops, HITL checkpoints, retries, and failure recovery
• Integrate LLM providers — work with Anthropic Claude, OpenAI, Gemini; implement prompt caching, structured outputs, and context management at scale; select the right model for the right task
• Build knowledge pipelines — design RAG systems, hybrid retrieval, vector stores, embedding pipelines, and memory layers that give agents contextual awareness
• Own backend reliability — REST/GraphQL APIs, async job queues, observability, latency optimization, cost tracking for token usage
• Drive system design decisions — lead architecture discussions, define component boundaries, evaluate build vs. buy tradeoffs, and document decisions (ADRs, diagrams)
• Maintain CI/CD pipelines — keep deployments fast, safe, and automated across dev/staging/prod environments
• Evaluate agent behavior — build LLM tracing, regression frameworks, and evals pipelines to detect behavioral drift, intent failures, and quality degradation across model updates
• Stay ahead of the curve — evaluate emerging standards (MCP, A2A, reasoning models, new frameworks) and make adoption recommendations
WHAT WE'RE LOOKING FOR
• Deep experience with at least one major LLM provider SDK (Anthropic, OpenAI, Gemini)
• Prompt engineering depth: few-shot design, chain-of-thought, structured outputs, tool/function calling, multi-turn context management, prompt caching, token budgeting
• Built real production agents — not just chatbots; systems that take actions, use tools, and operate autonomously
• Understands agent patterns: ReAct, plan-and-execute, multi-agent orchestration, reflection loops, supervisor/worker hierarchies
• Familiar with MCP (Model Context Protocol), A2A, and agent frameworks (LangGraph, CrewAI, AutoGen) — and when NOT to use them
• Knows how to handle non-determinism: retry strategies, output validation, graceful degradation
Emerging AI Landscape
• Reasoning models — knows when to use o1/o3, Claude Extended Thinking vs. standard models; treats model selection as an architectural decision, not an afterthought
• Model routing & cascading — routes queries to right-sized models (Haiku/Flash for simple, Opus/GPT-4o for complex) based on cost/capability tradeoffs; doesn't default to the largest model for everything
• Compound AI systems — combine multiple specialized models and tools in pipelines rather than relying on a single general-purpose model
• Human-in-the-loop (HITL) — designs checkpointing for long-running agents, approval workflows, interrupt/resume patterns; knows when and where agents need human gates
• Persistent agent state — durable execution patterns (LangGraph checkpointers, Temporal-style workflows); agents that survive restarts and partial failures
• Prompt injection & AI security — understands attack surfaces unique to tool-using agents; builds with guardrails in mind (NeMo Guardrails, Llama Guard, output schemas)
• Semantic caching & batch inference — reduces LLM costs through response-level semantic caching and async batch APIs; thinks about cost at 10x before it becomes a problem
LLMOps & Evaluation
• LLM-specific observability: Langfuse, LangSmith, Helicone, or Arize Phoenix — prompt/response tracing, token cost per flow, per-step latency, session replay
• Evals methodology: RAGAS, DeepEval, Promptfoo, or Braintrust — systematic evaluation of LLM outputs, not vibes-based testing
• Treats prompts as code: versioning, A/B testing, rollback; understands how prompt changes ripple into downstream agent behavior
• Knows how to run AI/non-deterministic systems in CI — mock vs. live modes, cost controls, behavioral regression (not just output format matching) • Cost tracking and optimization: token usage per workflow, model spend dashboards, alerting on cost regressions
Knowledge & Retrieval
• RAG pipeline depth: document ingestion, chunking strategies (fixed, semantic, agentic), embedding, retrieval, reranking
• Hybrid search — dense (vector) + sparse (BM25/keyword) retrieval; understands where pure vector search underperforms and how to combine approaches
• Awareness of GraphRAG and contextual retrieval (prepending context summaries to chunks) for structured knowledge domains
• Worked with vector databases: Pinecone, Weaviate, pgvector, or similar
• Understands knowledge graph concepts and when structured retrieval outperforms embedding-based approaches
System Design & Architecture
• Can design distributed systems end-to-end: from data flow diagrams to API contracts to deployment topology
• Experience designing for agentic-specific challenges: long-running tasks, partial failures, agent state persistence, tool result caching, idempotent retries
• Knows when to use async vs. sync, event-driven vs. request-response, monolith vs. microservices — and can justify the choice
• Comfortable with architecture patterns: event sourcing, CQRS, saga pattern for multi-step workflows, circuit breakers
• Documents decisions clearly (ADRs, architecture diagrams, onboarding docs) — not just builds, but communicates design
• Has opinions on API versioning, backward compatibility, and service contracts
• Thinks about scalability and cost together — not just 'will it scale' but 'what does it cost at 10x'
Backend Engineering
•Strong in Python and/or Node.js/TypeScript for backend services
• API design (REST, GraphQL, webhooks), async patterns, background job processing
• Database proficiency: PostgreSQL, DynamoDB, or equivalent
• Strong on observability: structured logging, distributed tracing, metrics
CI/CD & Developer Tooling
• Proficient with GitHub Actions — writing workflows from scratch, not just copy-pasting templates
• Understands CI/CD pipeline stages: lint → test → build → deploy with proper gate logic
• Can set up environment-specific deployments (dev/staging/prod) with appropriate safeguards
• Familiar with Docker and container-based build pipelines
• Knows how to handle secrets management in pipelines (AWS Secrets Manager, GitHub Secrets)
• Experience with deployment strategies: blue/green, canary, rolling updates
• Understands how to test AI/non-deterministic systems in CI — mock vs. live modes, cost controls, timeout handling
AWS (Core Services)
• Compute: Lambda (serverless functions, cold start optimization, layer management), ECS/Fargate (containerized long-running agents and services) • API & Routing: API Gateway for REST endpoints, ALB for container-based routing
• Storage: S3 for artifacts/reports/model outputs, DynamoDB or RDS/PostgreSQL for structured data
• Async & Queuing: SQS for decoupled task queues, SNS for fan-out, EventBridge for scheduled or event-driven triggers
• IAM: Writing least-privilege policies, understanding role assumption, cross-service permissions — not just using admin credentials
• Observability: CloudWatch Logs and Metrics for baseline monitoring; experience connecting to Datadog or similar APM preferred
• Infrastructure as Code: Proficient with Terraform — writing modules, managing state (remote backends, workspaces), handling drift, and deploying infra changes safely without manual console work
Mindset
• Comfortable with high ambiguity — AI systems behavior isn't always predictable
• Ships iteratively; knows when a prototype is good enough vs. when to harden
• Opinionated about quality but pragmatic about tradeoffs
NICE TO HAVE
• Experience with browser automation in agentic contexts (Playwright, Puppeteer, Anthropic computer use API)
• Multi-modal agent experience — vision inputs, document parsing, image understanding in agentic loops
• Familiarity with agent evaluation frameworks and behavioral regression testing
• Contributions to open-source AI tooling
• Background in ML/data science alongside software engineering
TECH STACK WE USE
• LLMs: Anthropic Claude, OpenAI
• Agent infra: Custom orchestration, MCP servers
• Backend: Node.js / Python, AWS Lambda + ECS/Fargate
• Browser automation: Playwright
• CI/CD: GitHub Actions
• Observability: Datadog, CloudWatch, Langfuse
• Data: PostgreSQL, S3, SQS, DynamoDB
• IaC: Terraform
WHY THIS ROLE
• You'll work on AI problems that are genuinely unsolved — not CRUD apps with an LLM wrapper
• Greenfield architecture decisions — your opinions shape the system
• Direct impact on product; small team, high ownership
• Exposure to the full agentic stack from infra to eval to UX
WHAT WE DON'T EXPECT
• A research background or ML theory depth — this is an engineering role
• Experience with every framework listed — curiosity and speed of learning matter more than coverage
• You need to know everything about the AI market — but you should have strong opinions about where it's heading
Click on Apply to know more.