Senior ML Platform Architect

Minutes to Seconds

Location: Bangalore North Rural, Karnataka, India
Job type: Full-time

Required skills

AWS
Azure
GCP
Kafka
Kubernetes
microservices
Terraform
uptime

About the role

Minutes to Seconds

Website: minutestoseconds.com
Job details:
Role Overview

The Senior ML Platform Architect is responsible for the strategic design and execution of foundational infrastructure powering global AI initiatives. This role focuses on building fied, multi-cloud ecosystem that integrates seamless model orchestration, real-time data connectivity via high-throughput messaging, and context-aware AI services. The work ensures that the organization’s AI capabilities are vendor-agnostic, secure, and built on a resilient, high-performance core.

Core Responsibilities

Multi-Cloud Enablement & ML Strategy

Vendor-Agnostic Architecture: Design and implementation of a multi-cloud ML strategy (AWS, GCP, Azure) to prevent vendor lock-in and optimize for global availability and cost efficiency.
Unified Model Orchestration: Architecture of abstract infrastructure layers using Kubernetes to ensure ML training and inference workloads port across
different cloud providers without code modification.
Global Connectivity: Establishment of cross-cloud networking and identity standards to maintain a consistent security posture and data access layer across all environments.

MCP (Model Context Protocol) Server Foundation

Contextual Architecture: Building and scaling the MCP Server foundation, enabling the decoupling of AI reasoning from tool execution for enhanced modularity.
Standardized Integration: Design of universal adapter layers that allow Large Language Models (LLMs) to securely access external databases, APIs, and internal file systems through standardized protocols.
Governance & Discovery: Architecture of centralized discovery services for
MCP servers to allow AI agents to dynamically find and invoke capabilities with strict audit trails.

High-Throughput Message Bus & Data Flow

Event-Driven AI Backbone: Design and implementation of a low-latency, high-throughput message bus (e.g., Kafka, Pulsar) to handle real-time data streaming and asynchronous ML pipeline triggers.
Scalable Feature Distribution: Architecture of the backbone for streaming features and model events, ensuring high-volume inference logs and telemetry data are ingested with zero data loss.
System Decoupling: Utilization of the message bus to decouple ML microservices, increasing the horizontal scalability and fault tolerance of the AI platform.

Strengthening the Core Application Layer

Leadership of the Security, Resilience, and Quality of Release chapter:
Security: Implementation of Zero-Trust architecture for AI workloads, including model weight encryption, secure secret management, and protection against adversarial attacks.
Resilience: Design of self-healing systems, multi-region failover strategies, and high-availability ML services to ensure mission-critical uptime.
Quality of Release: Establishment of automated, architecture-level release gates including performance benchmarking, security scanning, and automated canary/blue-green deployment strategies.

Requirements

Technical Requirements

Experience: 10+ years of professional experience in Systems Architecture or Software Engineering, with at least 4+ years specifically dedicated to ML Platform or AI Infrastructure.
Cloud Mastery: Expert-level proficiency in architecting for Multi-Cloud environments and managing distributed systems at scale.
Streaming: Messaging: Proven track record with Kafka, Pulsar, or similar high-throughput event-streaming technologies.
Protocols &a AI Integration: Strong understanding of JSON-RPC, Server-Sent Events (SSE), and modern AI communication protocols like MCP.
Engineering Standards: Mastery of Kubernetes, Terraform (IaC), and service mesh technologies to maintain platform stability

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.