Principal Cloud Engineer

SourcingXPress

Location: Bengaluru, Karnataka, India
Job type: Full-time

Required skills

Python
Amazon Web Services
Bash
cloud infrastructure
CloudFormation
Datadog
DevOps
Docker
GitHub
Helm
incident response
infrastructure management
Jenkins
Kubernetes
load balancing
Shell Scripting
SRE
Terraform
VPC
web services

About the role

SourcingXPress

Website: sourcingxpress.com
Job details:
Company: Urbint

Website: Visit Website

Business Type: Startup

Company Type: Product

Business Model: B2B

Funding Stage: Acquired

Industry: Information Technology

Salary Range: ₹ 40-60 Lacs PA

Job Description

Role Overview

We are seeking a Principal CloudOps/Platform Engineer responsible for designing, operating, and scaling cloud infrastructure and internal platforms that support our engineering teams.

This role focuses on cloud infrastructure operations, Kubernetes platform engineering, and reliability practices to ensure highly available, scalable, and secure systems.

The ideal candidate has strong experience in AWS cloud environments, Kubernetes platforms, infrastructure automation, and production reliability.

Key Responsibilities

Cloud Infrastructure Operations

Design, implement, and operate scalable AWS cloud infrastructure
Build and manage highly available cloud environments across multiple services
Optimize cloud resources for performance, reliability, and cost efficiency
Implement cloud security and governance best practices
Support multi-environment (dev, staging, production) infrastructure

Kubernetes Platform Engineering

Build and operate production-grade Kubernetes clusters
Develop standardized deployment patterns for containerized applications
Manage cluster networking, ingress, and autoscaling
Enable developers with consistent and reliable container platforms

Infrastructure as Code

Develop and maintain infrastructure using Terraform
Build reusable infrastructure modules and automated environment provisioning
Implement Git-based workflows for infrastructure management

Reliability Engineering (SRE Practices)

Define and implement SLIs and SLOs for production services
Improve system reliability through proactive monitoring and automation
Lead incident response and post-incident reviews
Implement observability solutions for system monitoring and performance analysis

Observability & Monitoring

Implement monitoring and alerting using tools such as:

Prometheus
Grafana
ELK Stack
Datadog or Dynatrace

Build dashboards and alerting systems to improve operational visibility

CI/CD and Deployment Automation

Develop and maintain automated deployment pipelines
Enable consistent build, test, and release workflows
Support container image build and deployment automation

Experience

Required Skills & Experience

8+ years of experience in Cloud Infrastructure, DevOps, or SRE roles
5+ years working with AWS cloud infrastructure
4+ years operating Kubernetes in production environments
Experience managing large-scale cloud platforms and distributed systems

Technical Skills

Cloud Platforms

Amazon Web Services (AWS)
Cloud networking (VPC, subnets, routing, load balancing)
Cloud security and IAM

Containers & Orchestration

Kubernetes
Docker
Helm

Infrastructure as Code

Terraform (preferred)
CloudFormation (optional)

Observability

Prometheus
Grafana
ELK / OpenSearch
Datadog / Dynatrace (optional)

Automation & Scripting

Python
Bash / Shell scripting

CI/CD

Jenkins
GitHub Actions
GitLab CI

What We Value

Strong systems thinking and problem-solving ability
Ability to design reliable infrastructure platforms
Ownership of production reliability and operational excellence
Collaboration with engineering teams to improve platform capabilities

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.