SarvaGram
Website:
sarvagram.com
Job details:
About the role The Cloud-Native Platform Architect is the hands-on technical authority for SarvaGram's platform engineering function. You design and build - not just advise. You own the IAM/RBAC SDK that every squad uses for authentication. You design the AI SDK integration layer that allows lending engineers to call LLM APIs safely and consistently. You establish the infrastructure-as-code standards, the DevSecOps pipeline, and the observability framework that makes the squads faster without accumulating security debt. This is a platform-first identity role. You do not think of yourself as an infrastructure engineer who happens to write Terraform - you think of yourself as a platform engineer who builds the surfaces that product squads use. The distinction matters: your output is developer experience, not just uptime. Key responsibilities Platform Engineering
- SDK & Shared Services
- Design and build the IAM/RBAC SDK - the authentication and authorisation layer that every squad integrates with.
- Design and build the AI SDK integration layer - a consistent, secure interface for squads to call LLM APIs (Claude, OpenAI) with rate limiting, cost tracking, fallback handling, and PII guardrails built in
- Own and evolve the API gateway configuration - routing rules, rate limiting, authentication enforcement, and observability instrumentation for all service-to-service and external API traffic
- Define and maintain the shared platform services catalogue - what your squad provides as a platform capability (logging, tracing, feature flags, config management) versus what squads build themselves
- Establish platform engineering standards and contribute to ADRs - documenting platform design decisions with context and rationale for the twelve squads that depend on these surfaces Infrastructure as Code & Cloud Architecture
- Own SarvaGram's AWS architecture
- VPC design, subnet segmentation, security groups, IAM roles, and the overall cloud topology that supports multi-environment (dev, staging, production) deployments
- Build and maintain Terraform module library - reusable, versioned IaC modules that squads can use to provision their own resources consistently without requiring Atlas involvement for every change
- Design the environment promotion strategy - how code moves from development through staging to production with appropriate gates, approvals, and rollback capabilities
- Own RDS, ElastiCache, SQS, SNS, and S3 configuration standards - how databases are provisioned, backed up, and scaled; how message queues are configured for reliability and observability
- Design disaster recovery and business continuity architecture
- RTO and RPO targets defined, tested, and documented for an RBI-regulated financial institution DevSecOps & Security by Design
- Build CI/CD pipelines that embed security scanning (SAST, SCA, container vulnerability scanning) as non-negotiable gates - not optional bolt-ons that slow delivery
- Own secrets management architecture - how API keys, database credentials, and certificates are stored, rotated, and accessed across twelve squads without creating single points of compromise
- Implement infrastructure compliance scanning
- AWS Config rules, Security Hub findings, and automated remediation for common misconfigurations
- Work with the CISO on application-layer security architecture - providing the infrastructure and SDK components that enforce the security policies.
- Establish the shift-left security culture on the platform side - every new infrastructure component reviewed for security implications before deployment Observability & Platform Reliability
- Design and implement the observability stack - distributed tracing (OpenTelemetry), structured logging, and metrics collection that gives twelve squads visibility into their services without each building their own
- Define SLOs and SLAs for platform services - what reliability guarantees Phoenix and Atlas make to product squads, how they are measured, and what happens when they are breached
- Build the on-call runbook library - documented incident response procedures for platform components that any engineer can follow at 2am without needing the Architect on the phone
- Own the cost optimisation program
- AWS cost visibility, rightsizing recommendations, reserved instance strategy, and FinOps practices appropriate for a growth-stage NBFC AI-Native Platform Practice
- Pioneer AI-augmented platform engineering - use Claude for Terraform module generation, CI/CD pipeline configuration, runbook drafting, and infrastructure ADR authoring
- Design the automation integration - how n8n workflows and AppSmith dashboards connect to platform infrastructure APIs safely and with appropriate access controls
- Lead the Platform Guild - fortnightly sessions on cloud architecture, DevSecOps practices, and AI-augmented infrastructure engineering for DevOps engineers across Atlas
- Consult with the Lending Platform Architect on infrastructure implications of lending domain designs
- API gateway configuration, database topology, and observability requirements for lending services Requirements Professional Experience
- 12 to 14 years in software/infrastructure engineering with at least 4 years in a platform or cloud architecture role - hands-on AWS experience is mandatory, not advisory
- Has designed and built shared platform services that other engineering teams consumed
- IAM libraries, observability frameworks, or deployment platforms; understands that developer experience is a product
- Has built DevSecOps pipelines with embedded security scanning, not bolted-on audits - knows the difference between security theatre and shift-left practice
- Active AI tool user for infrastructure work - uses Claude or equivalent to generate Terraform, write runbooks, and explore architecture trade-offs today
- Fintech or regulated industry experience is a meaningful plus - understands that RBI compliance and data localisation constraints affect infrastructure decisions Mindset & Soft Skills
- Platform-first: Platform-first identity: thinks 'how do I make 10 squads faster and safer' not 'how do I keep the servers running'
- Builder by default: Builder by default: reaches for code before PowerPoint - designs are expressed in Terraform, Go, and YAML, not just diagrams
- Security by design: Security by design: treats security as a platform feature, not a review gate - every new service has IAM, logging, and secrets management built in from day one
- AI-native operator: AI-native operator: uses Claude daily for IaC generation, runbook drafting, and architecture exploration - models this practice for the DevOps engineers in Atlas
- Domain curiosity: Lending domain curiosity: interested in understanding why the lending platform makes the architecture demands it does - not just executing infrastructure tickets Technical Skills AWS Platform
- Mandatory VPC / subnet / security group design & IAM / RBAC architecture EKS / ECS container orchestration & RDS / ElastiCache / SQS / S3 Terraform / IaC & AWS Config / Security Hub & CloudWatch / X-Ray / Open Telemetry Platform Engineering
- Core SDK design and development & API gateway (Kong / AWS API GW) CI/CD pipeline design (GitHub Actions / Jenkins) & Secrets management (Vault / AWS Secrets Manager) Service mesh (Istio / Linkerd) & Docker / Kubernetes & Python / Go for platform tooling DevSecOps
- Core SAST / SCA integration & Container security scanning Shift-left security practice & Compliance-as-code DevSecOps pipeline design & Incident response runbooks AI Native
- Core Expectation Claude / LLM API integration architecture & AI SDK design patterns AI-augmented IaC generation & Prompt engineering for infrastructure tooling Benefits SarvaGram is on a mission to revolutionize financial services for millions in rural India. We're building the nation's first data-driven platform that combines cutting-edge technology with a human touch to unlock financial possibilities for underserved households. This is your chance to be at the forefront of innovation. Join us and:
- Shape the future of FinTech: We're not just building a product, we're creating a new category. Be a part of defining the future of financial inclusion for rural India.
- Embrace a high-growth, high-impact environment: This is a non-linear growth opportunity. Build a platform used by millions and witness the network effect drive massive scale.
- Tackle real-world challenges: Apply your skills to solve critical problems and directly empower rural communities.
- Craft solutions that touch lives: Develop innovative products used by diverse household members, each with unique needs
Click on Apply to know more.