Senior AI & Agentic System Engineer

Alternative Path

full-time

Required skills

Python
AWS
backend
caching
CloudWatch
data science
Datadog
Docker
DynamoDB
ECS
GitHub
GraphQL
JS
Lambda
microservices
Node
PostgreSQL
React
regression
Serverless
Terraform
TypeScript
writing modules

About the role

Alternative Path

Website: alternative-path.com
Job details:

We're building agentic AI systems that autonomously reason, plan, and act across complex workflows. You'll be a core contributor designing and shipping production-grade AI agents, integrating with LLM providers, orchestrating multi-step workflows, and building the infrastructure that makes agents reliable, observable, and scalable. This is a deeply technical role — you'll work at the intersection of LLM engineering, backend systems, and agent architecture.

Job Title - Senior AI & Agentic System Engineer

Location - Gurgaon, India (You may note that this role is fully remote)

Level – Senior (5+)

Team – AI Platform / Engineering

WHAT YOU'LL DO

• Design and build AI agents — autonomous systems that use tools, browse the web, call APIs, manage memory, and complete multi-step tasks with minimal human intervention

• Architect agentic workflows — orchestrate multi-agent pipelines (planner → executor → verifier patterns), handle tool-use loops, HITL checkpoints, retries, and failure recovery

• Integrate LLM providers — work with Anthropic Claude, OpenAI, Gemini; implement prompt caching, structured outputs, and context management at scale; select the right model for the right task

• Build knowledge pipelines — design RAG systems, hybrid retrieval, vector stores, embedding pipelines, and memory layers that give agents contextual awareness

• Own backend reliability — REST/GraphQL APIs, async job queues, observability, latency optimization, cost tracking for token usage

• Drive system design decisions — lead architecture discussions, define component boundaries, evaluate build vs. buy tradeoffs, and document decisions (ADRs, diagrams)

• Maintain CI/CD pipelines — keep deployments fast, safe, and automated across dev/staging/prod environments

• Evaluate agent behavior — build LLM tracing, regression frameworks, and evals pipelines to detect behavioral drift, intent failures, and quality degradation across model updates

• Stay ahead of the curve — evaluate emerging standards (MCP, A2A, reasoning models, new frameworks) and make adoption recommendations

WHAT WE'RE LOOKING FOR

• Deep experience with at least one major LLM provider SDK (Anthropic, OpenAI, Gemini)

• Prompt engineering depth: few-shot design, chain-of-thought, structured outputs, tool/function calling, multi-turn context management, prompt caching, token budgeting

• Built real production agents — not just chatbots; systems that take actions, use tools, and operate autonomously

• Understands agent patterns: ReAct, plan-and-execute, multi-agent orchestration, reflection loops, supervisor/worker hierarchies

• Familiar with MCP (Model Context Protocol), A2A, and agent frameworks (LangGraph, CrewAI, AutoGen) — and when NOT to use them

• Knows how to handle non-determinism: retry strategies, output validation, graceful degradation

Emerging AI Landscape

• Reasoning models — knows when to use o1/o3, Claude Extended Thinking vs. standard models; treats model selection as an architectural decision, not an afterthought

• Model routing & cascading — routes queries to right-sized models (Haiku/Flash for simple, Opus/GPT-4o for complex) based on cost/capability tradeoffs; doesn't default to the largest model for everything

• Compound AI systems — combine multiple specialized models and tools in pipelines rather than relying on a single general-purpose model

• Human-in-the-loop (HITL) — designs checkpointing for long-running agents, approval workflows, interrupt/resume patterns; knows when and where agents need human gates

• Persistent agent state — durable execution patterns (LangGraph checkpointers, Temporal-style workflows); agents that survive restarts and partial failures

• Prompt injection & AI security — understands attack surfaces unique to tool-using agents; builds with guardrails in mind (NeMo Guardrails, Llama Guard, output schemas)

• Semantic caching & batch inference — reduces LLM costs through response-level semantic caching and async batch APIs; thinks about cost at 10x before it becomes a problem

LLMOps & Evaluation

• LLM-specific observability: Langfuse, LangSmith, Helicone, or Arize Phoenix — prompt/response tracing, token cost per flow, per-step latency, session replay

• Evals methodology: RAGAS, DeepEval, Promptfoo, or Braintrust — systematic evaluation of LLM outputs, not vibes-based testing

• Treats prompts as code: versioning, A/B testing, rollback; understands how prompt changes ripple into downstream agent behavior

• Knows how to run AI/non-deterministic systems in CI — mock vs. live modes, cost controls, behavioral regression (not just output format matching) • Cost tracking and optimization: token usage per workflow, model spend dashboards, alerting on cost regressions

Knowledge & Retrieval

• RAG pipeline depth: document ingestion, chunking strategies (fixed, semantic, agentic), embedding, retrieval, reranking

• Hybrid search — dense (vector) + sparse (BM25/keyword) retrieval; understands where pure vector search underperforms and how to combine approaches

• Awareness of GraphRAG and contextual retrieval (prepending context summaries to chunks) for structured knowledge domains

• Worked with vector databases: Pinecone, Weaviate, pgvector, or similar

• Understands knowledge graph concepts and when structured retrieval outperforms embedding-based approaches

System Design & Architecture

• Can design distributed systems end-to-end: from data flow diagrams to API contracts to deployment topology

• Experience designing for agentic-specific challenges: long-running tasks, partial failures, agent state persistence, tool result caching, idempotent retries

• Knows when to use async vs. sync, event-driven vs. request-response, monolith vs. microservices — and can justify the choice

• Comfortable with architecture patterns: event sourcing, CQRS, saga pattern for multi-step workflows, circuit breakers

• Documents decisions clearly (ADRs, architecture diagrams, onboarding docs) — not just builds, but communicates design

• Has opinions on API versioning, backward compatibility, and service contracts

• Thinks about scalability and cost together — not just 'will it scale' but 'what does it cost at 10x'

Backend Engineering

•Strong in Python and/or Node.js/TypeScript for backend services

• API design (REST, GraphQL, webhooks), async patterns, background job processing

• Database proficiency: PostgreSQL, DynamoDB, or equivalent

• Strong on observability: structured logging, distributed tracing, metrics

CI/CD & Developer Tooling

• Proficient with GitHub Actions — writing workflows from scratch, not just copy-pasting templates

• Understands CI/CD pipeline stages: lint → test → build → deploy with proper gate logic

• Can set up environment-specific deployments (dev/staging/prod) with appropriate safeguards

• Familiar with Docker and container-based build pipelines

• Knows how to handle secrets management in pipelines (AWS Secrets Manager, GitHub Secrets)

• Experience with deployment strategies: blue/green, canary, rolling updates

• Understands how to test AI/non-deterministic systems in CI — mock vs. live modes, cost controls, timeout handling

AWS (Core Services)

• Compute: Lambda (serverless functions, cold start optimization, layer management), ECS/Fargate (containerized long-running agents and services) • API & Routing: API Gateway for REST endpoints, ALB for container-based routing

• Storage: S3 for artifacts/reports/model outputs, DynamoDB or RDS/PostgreSQL for structured data

• Async & Queuing: SQS for decoupled task queues, SNS for fan-out, EventBridge for scheduled or event-driven triggers

• IAM: Writing least-privilege policies, understanding role assumption, cross-service permissions — not just using admin credentials

• Observability: CloudWatch Logs and Metrics for baseline monitoring; experience connecting to Datadog or similar APM preferred

• Infrastructure as Code: Proficient with Terraform — writing modules, managing state (remote backends, workspaces), handling drift, and deploying infra changes safely without manual console work

Mindset

• Comfortable with high ambiguity — AI systems behavior isn't always predictable

• Ships iteratively; knows when a prototype is good enough vs. when to harden

• Opinionated about quality but pragmatic about tradeoffs

NICE TO HAVE

• Experience with browser automation in agentic contexts (Playwright, Puppeteer, Anthropic computer use API)

• Multi-modal agent experience — vision inputs, document parsing, image understanding in agentic loops

• Familiarity with agent evaluation frameworks and behavioral regression testing

• Contributions to open-source AI tooling

• Background in ML/data science alongside software engineering

TECH STACK WE USE

• LLMs: Anthropic Claude, OpenAI

• Agent infra: Custom orchestration, MCP servers

• Backend: Node.js / Python, AWS Lambda + ECS/Fargate

• Browser automation: Playwright

• CI/CD: GitHub Actions

• Observability: Datadog, CloudWatch, Langfuse

• Data: PostgreSQL, S3, SQS, DynamoDB

• IaC: Terraform

WHY THIS ROLE

• You'll work on AI problems that are genuinely unsolved — not CRUD apps with an LLM wrapper

• Greenfield architecture decisions — your opinions shape the system

• Direct impact on product; small team, high ownership

• Exposure to the full agentic stack from infra to eval to UX

WHAT WE DON'T EXPECT

• A research background or ML theory depth — this is an engineering role

• Experience with every framework listed — curiosity and speed of learning matter more than coverage

• You need to know everything about the AI market — but you should have strong opinions about where it's heading

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.