Data Platform Architect
Minutes to Seconds
- Location
- Bangalore North Rural, Karnataka, India
- Job type
- Full-time
Required skills
- Python
- AWS
- Apache
- Apache Kafka
- Azure
- Backbone
- BigQuery
- capacity planning
- clustering
- cross-functional
- data analytics
- Databricks
- Dataflow
- end-to-end
- Flink
- GCP
- infrastructure-as-code
- IoT
- JSON
- Kafka
- reference architecture
- Reference Architecture
- Spark
- specification
- Terraform
- Unity
About the role
Minutes to Seconds
Website:
minutestoseconds.com
Job details:
Role Overview
- We are looking for a seasoned Data Platform Architect to own the design and delivery of three foundational pillars: a resilient multi-cloud data platform, an enterprise MCP (Model Context Protocol) Server layer that connects AI workloads to governed data assets, and a high-throughput message bus
- Capable of sustaining millions of events per second across distributed consumers. This is a senior architecture role that combines deep hands-on engineering with cross-functional influence across data, AI, and infrastructure teams.
MUST HAVEs
- Multi-Cloud Enablement
- MCP Server Foundation
- High-Throughput Message Bus
Key ResponsibilitiesMCP Server FoundationHigh-Throughput Message Bus ArchitecturePlatform Governance & Engineering Excellence
- Multi-Cloud Data Platform Enablement
- Architect a cloud-agnostic data platform that operates seamlessly across AWS, Azure, and GCP, with unified identity, governance, and cost controls
- Define the reference architecture for lakehouse deployments (Delta Lake / Iceberg / Hudi) on each cloud, ensuring format interoperability and zero-lock-in data portability
- Design cross-cloud data movement patterns including replication, federation, and active-active topologies using tools such as Debezium, Airbyte, and cloud-native transfer services
- Establish a cloud-agnostic Unity Catalog or open metadata layer for consistent lineage, access control, and discoverability across all cloud zones
- Drive FinOps practices: right-sizing compute, storage tiering, and reserved capacity planning across cloud providers
- Architect and build the enterprise MCP Server layer that exposes governed data assets, query interfaces, and tool APIs to LLM-driven agents and copilots
- Define the MCP resource taxonomy: which data assets surface as Resources, which operations become Tools, and which contextual feeds become Prompts
- Implement authentication and authorization at the MCP boundary, ensuring AI agents operate within row-level, column-level, and dataset-level access policies
- Design the MCP Server for multi-tenancy, supporting concurrent agent workloads with rate limiting, audit logging, and observability hooks
- Collaborate with AI/ML teams to validate that MCP-served context materially reduces hallucination. rates and improves retrieval grounding quality
-
- Produce the MCP Server SDK integration guide for internal engineering teams building AI- powered applications
- Design and own the enterprise message bus architecture targeting sustained throughput of 1M+ events/sec with sub-50ms end-to-end latency at P99
- Evaluate and select the appropriate messaging backbone (Apache Kafka, Confluent Platform, Redpanda, AWS Kinesis, or Azure Event Hubs) based on workload profiles
- Define partitioning strategies, topic compaction policies, retention tiers, and tiered storage configurations aligned to IoT telemetry, CDC, and operational event patterns
- Architect Schema Registry governance including schema evolution contracts (Avro / Protobuf / JSON Schema) and compatibility enforcement pipelines
- Design consumer group topologies for stream processing frameworks (Flink, Spark Structured Streaming, Delta Live Tables) and ensure back pressure and offset management are production-grade
- Integrate the message bus with the multi-cloud lakehouse as a bronze ingestion layer, enforcing idempotency and exactly-once delivery guarantees
- Define and enforce platform-wide standards: naming conventions, tagging taxonomy, SLA tiers, DR objectives, and run-book templates
- Champion Infrastructure-as-Code practices across Terraform, Pulumi, or Bicep for all cloud resources and data platform components
- Lead architecture review boards (ARBs) and own the technical decision log (ADRs) for all major platform choices
- Mentor senior data engineers and serve as escalation point for platform-level production incidents
Requirements
Required Qualifications
- 10+ years in data engineering with at least 3 years in platform or solutions architecture roles
- Hands-on experience architecting production lakehouse platforms on two or more of: AWS (S3,
- Glue, Athena, EMR, Kinesis), Azure (ADLS Gen2, Databricks, Event Hubs, Synapse), GCP (BigQuery, Dataflow, Pub/Sub)
- Deep expertise in Apache Kafka or equivalent message bus: cluster sizing, partition leadership, consumer lag management, and MirrorMaker 2 / replication topologies
- Strong command of open table formats: Delta Lake, Apache Iceberg, or Apache Hudi — including time travel, merge-on-read vs. copy-on-write trade-offs, and OPTIMIZE / VACUUM strategies
- Proficiency in Python and PySpark for platform automation, ingestion framework development, and schema validation pipelines
- Demonstrated experience with metadata management: Apache Atlas, Unity Catalog, DataHub, or equivalent open metadata solutions
- Familiarity with MCP specification or equivalent AI tool-use protocols; experience building or integrating API layers consumed by LLM agents is a strong plus
- Infrastructure-as-Code fluency (Terraform, Pulumi, or equivalent) and CI/CD pipeline design for data platform deployments
- Strong written communication skills: ability to produce architecture decision records, RFP responses, and client-facing implementation guides
Preferred Qualifications
- Experience with Delta Live Tables (DLT) in Databricks, including CDC pipeline design and Liquid Clustering optimization
- Exposure to vector databases (Pinecone, Weaviate, pgvector) and RAG pipeline architecture for grounding LLMs in enterprise data
- Familiarity with Redpanda or Confluent Cloud as managed Kafka alternatives and their cost/performance trade-offs at scale
- Knowledge of data mesh operating models: domain ownership, data products, and federated governance
- Experience in regulated industries (energy, manufacturing, IoT telemetry) where data quality, auditability, and retention policies are mission-critical
- Cloud certifications: AWS Data Analytics Specialty, Azure Data Engineer Associate, GCP Professional Data Engineer, or Databricks Certified Data Engineer Professional
- Prior consulting or multi-client engagement experience; comfort navigating multiple concurrent stakeholder environments
Click on Apply to know more.
This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.