Software Engineer - AI Platform Services

Millennium

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

AWS
Azure
capacity planning
Datadog
FastAPI
GCP
incident response
Java
OAuth
platform services
Terraform
SDLC

About the role

Millennium

Website: mlp.com
Job details:
We’re a high-impact platform team building the firm’s internal AI platform that bridges traditional enterprise platforms (identity, data, workflow, governance) with GenAI tools (agents, copilots, model providers).

This is a senior engineering role focused on designing and owning the core service layer that agentic tools run on: an AI gateway, model/provider routing, policy/guardrails, tool-execution interfaces, high-throughput async APIs, and production-grade observability.. MCP (Model Context Protocol) services are part of the platform portfolio—enabling secure, governed connectivity between agent runtimes and enterprise tools/data. You’ll partner closely with AI Engineers building agent workflows—your focus is to make the underlying platform fast, reliable, secure, and easy to build on.

Key Responsibilities

Design, build, and operate core platform services (Python; REST + async; streaming where appropriate) powering firm-wide internal AI/agentic capabilities.
Own gateway/platform concerns end-to-end: routing, timeouts/retries, streaming, request shaping, rate limits/quotas, multi-tenancy, policy enforcement, provider abstraction, safe degradation, and robust client experience.
Build and operate MCP capabilities as part of the platform.
Build for scale and availability on Kubernetes: autoscaling, rollout strategies, capacity planning, performance tuning, and production debugging.
Raise reliability practices: define and manage SLOs/SLIs, instrumentation standards, incident response/runbooks, post-incident follow-ups, load/resilience testing, and operational excellence.
Improve delivery safety: CI/CD, environment promotion, IaC-driven repeatability, and secure SDLC practices.
Influence roadmap and technical strategy: prioritize foundational investments and reduce platform risk for a business-critical internal platform.

Required Qualifications

7+ years of professional software engineering experience (or equivalent practical experience)
Strong expertise in Python, Java, or Go, including async patterns, concurrency, and building high-throughput services (FastAPI or similar).
Solid distributed systems fundamentals: idempotency, backpressure, failure isolation, consistency tradeoffs, rate limiting, retries/timeouts.
Production experience operating services on Kubernetes (deployments, autoscaling, debugging, observability, performance).
Basic familiarity with LLM integration patterns (streaming responses, tool/function calling)
Demonstrated design leadership (RFCs, architecture reviews, leading cross-team initiatives).
Excellent communication skills—able to translate technical tradeoffs to stakeholders and partner teams.

Preferred Qualifications

Experience with service-to-service authentication patterns (API keys, OAuth/JWT, mTLS concepts).
Familiarity with observability tooling (structured logs, metrics, tracing; Datadog or OpenTelemetry a plus).
Strong fundamentals in AWS (or GCP/Azure) relevant to secure platforms (IAM, networking basics, compute, logging/monitoring patterns).
Working proficiency with Terraform and automation-first operations (repeatable environments, policy checks, safe rollouts).
Comfort using AI dev tools (Claude Code, Cursor, Gemini CLI) responsibly (tests, validation, secure coding).

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.