Data Platform Architect

Minutes to Seconds

Location: Bangalore North Rural, Karnataka, India
Job type: Full-time

Required skills

Python
AWS
Apache
Apache Kafka
Azure
Backbone
BigQuery
capacity planning
clustering
cross-functional
data analytics
Databricks
Dataflow
end-to-end
Flink
GCP
infrastructure-as-code
IoT
JSON
Kafka
reference architecture
Reference Architecture
Spark
specification
Terraform
Unity

About the role

Minutes to Seconds

Website: minutestoseconds.com
Job details:
Role Overview

We are looking for a seasoned Data Platform Architect to own the design and delivery of three foundational pillars: a resilient multi-cloud data platform, an enterprise MCP (Model Context Protocol) Server layer that connects AI workloads to governed data assets, and a high-throughput message bus
Capable of sustaining millions of events per second across distributed consumers. This is a senior architecture role that combines deep hands-on engineering with cross-functional influence across data, AI, and infrastructure teams.

MUST HAVEs

Multi-Cloud Enablement
MCP Server Foundation
High-Throughput Message Bus

Key ResponsibilitiesMCP Server FoundationHigh-Throughput Message Bus ArchitecturePlatform Governance & Engineering Excellence

Multi-Cloud Data Platform Enablement
Architect a cloud-agnostic data platform that operates seamlessly across AWS, Azure, and GCP, with unified identity, governance, and cost controls
Define the reference architecture for lakehouse deployments (Delta Lake / Iceberg / Hudi) on each cloud, ensuring format interoperability and zero-lock-in data portability
Design cross-cloud data movement patterns including replication, federation, and active-active topologies using tools such as Debezium, Airbyte, and cloud-native transfer services
Establish a cloud-agnostic Unity Catalog or open metadata layer for consistent lineage, access control, and discoverability across all cloud zones
Drive FinOps practices: right-sizing compute, storage tiering, and reserved capacity planning across cloud providers
Architect and build the enterprise MCP Server layer that exposes governed data assets, query interfaces, and tool APIs to LLM-driven agents and copilots
Define the MCP resource taxonomy: which data assets surface as Resources, which operations become Tools, and which contextual feeds become Prompts
Implement authentication and authorization at the MCP boundary, ensuring AI agents operate within row-level, column-level, and dataset-level access policies
Design the MCP Server for multi-tenancy, supporting concurrent agent workloads with rate limiting, audit logging, and observability hooks
Collaborate with AI/ML teams to validate that MCP-served context materially reduces hallucination. rates and improves retrieval grounding quality
Produce the MCP Server SDK integration guide for internal engineering teams building AI- powered applications
Design and own the enterprise message bus architecture targeting sustained throughput of 1M+ events/sec with sub-50ms end-to-end latency at P99
Evaluate and select the appropriate messaging backbone (Apache Kafka, Confluent Platform, Redpanda, AWS Kinesis, or Azure Event Hubs) based on workload profiles
Define partitioning strategies, topic compaction policies, retention tiers, and tiered storage configurations aligned to IoT telemetry, CDC, and operational event patterns
Architect Schema Registry governance including schema evolution contracts (Avro / Protobuf / JSON Schema) and compatibility enforcement pipelines
Design consumer group topologies for stream processing frameworks (Flink, Spark Structured Streaming, Delta Live Tables) and ensure back pressure and offset management are production-grade
Integrate the message bus with the multi-cloud lakehouse as a bronze ingestion layer, enforcing idempotency and exactly-once delivery guarantees
Define and enforce platform-wide standards: naming conventions, tagging taxonomy, SLA tiers, DR objectives, and run-book templates
Champion Infrastructure-as-Code practices across Terraform, Pulumi, or Bicep for all cloud resources and data platform components
Lead architecture review boards (ARBs) and own the technical decision log (ADRs) for all major platform choices
Mentor senior data engineers and serve as escalation point for platform-level production incidents

Requirements

Required Qualifications

10+ years in data engineering with at least 3 years in platform or solutions architecture roles
Hands-on experience architecting production lakehouse platforms on two or more of: AWS (S3,
Glue, Athena, EMR, Kinesis), Azure (ADLS Gen2, Databricks, Event Hubs, Synapse), GCP (BigQuery, Dataflow, Pub/Sub)
Deep expertise in Apache Kafka or equivalent message bus: cluster sizing, partition leadership, consumer lag management, and MirrorMaker 2 / replication topologies
Strong command of open table formats: Delta Lake, Apache Iceberg, or Apache Hudi — including time travel, merge-on-read vs. copy-on-write trade-offs, and OPTIMIZE / VACUUM strategies
Proficiency in Python and PySpark for platform automation, ingestion framework development, and schema validation pipelines
Demonstrated experience with metadata management: Apache Atlas, Unity Catalog, DataHub, or equivalent open metadata solutions
Familiarity with MCP specification or equivalent AI tool-use protocols; experience building or integrating API layers consumed by LLM agents is a strong plus
Infrastructure-as-Code fluency (Terraform, Pulumi, or equivalent) and CI/CD pipeline design for data platform deployments
Strong written communication skills: ability to produce architecture decision records, RFP responses, and client-facing implementation guides

Preferred Qualifications

Experience with Delta Live Tables (DLT) in Databricks, including CDC pipeline design and Liquid Clustering optimization
Exposure to vector databases (Pinecone, Weaviate, pgvector) and RAG pipeline architecture for grounding LLMs in enterprise data
Familiarity with Redpanda or Confluent Cloud as managed Kafka alternatives and their cost/performance trade-offs at scale
Knowledge of data mesh operating models: domain ownership, data products, and federated governance
Experience in regulated industries (energy, manufacturing, IoT telemetry) where data quality, auditability, and retention policies are mission-critical
Cloud certifications: AWS Data Analytics Specialty, Azure Data Engineer Associate, GCP Professional Data Engineer, or Databricks Certified Data Engineer Professional
Prior consulting or multi-client engagement experience; comfort navigating multiple concurrent stakeholder environments

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.