SourcingXPress
Website:
sourcingxpress.com
Job details:
Company: Urbint
Website: Visit Website
Business Type: Startup
Company Type: Product
Business Model: B2B
Funding Stage: Acquired
Industry: Information Technology
Salary Range: ₹ 40-60 Lacs PA
Job Description
Role Overview
We are seeking a Principal CloudOps/Platform Engineer responsible for designing, operating, and scaling cloud infrastructure and internal platforms that support our engineering teams.
This role focuses on cloud infrastructure operations, Kubernetes platform engineering, and reliability practices to ensure highly available, scalable, and secure systems.
The ideal candidate has strong experience in AWS cloud environments, Kubernetes platforms, infrastructure automation, and production reliability.
Key Responsibilities
Cloud Infrastructure Operations
- Design, implement, and operate scalable AWS cloud infrastructure
- Build and manage highly available cloud environments across multiple services
- Optimize cloud resources for performance, reliability, and cost efficiency
- Implement cloud security and governance best practices
- Support multi-environment (dev, staging, production) infrastructure
Kubernetes Platform Engineering
- Build and operate production-grade Kubernetes clusters
- Develop standardized deployment patterns for containerized applications
- Manage cluster networking, ingress, and autoscaling
- Enable developers with consistent and reliable container platforms
Infrastructure as Code
- Develop and maintain infrastructure using Terraform
- Build reusable infrastructure modules and automated environment provisioning
- Implement Git-based workflows for infrastructure management
Reliability Engineering (SRE Practices)
- Define and implement SLIs and SLOs for production services
- Improve system reliability through proactive monitoring and automation
- Lead incident response and post-incident reviews
- Implement observability solutions for system monitoring and performance analysis
Observability & Monitoring
- Implement monitoring and alerting using tools such as:
- Prometheus
- Grafana
- ELK Stack
- Datadog or Dynatrace
- Build dashboards and alerting systems to improve operational visibility
CI/CD and Deployment Automation
- Develop and maintain automated deployment pipelines
- Enable consistent build, test, and release workflows
- Support container image build and deployment automation
Experience
Required Skills & Experience
- 8+ years of experience in Cloud Infrastructure, DevOps, or SRE roles
- 5+ years working with AWS cloud infrastructure
- 4+ years operating Kubernetes in production environments
- Experience managing large-scale cloud platforms and distributed systems
Technical Skills
Cloud Platforms
- Amazon Web Services (AWS)
- Cloud networking (VPC, subnets, routing, load balancing)
- Cloud security and IAM
Containers & Orchestration
Infrastructure as Code
- Terraform (preferred)
- CloudFormation (optional)
Observability
- Prometheus
- Grafana
- ELK / OpenSearch
- Datadog / Dynatrace (optional)
Automation & Scripting
- Python
- Bash / Shell scripting
CI/CD
- Jenkins
- GitHub Actions
- GitLab CI
What We Value
- Strong systems thinking and problem-solving ability
- Ability to design reliable infrastructure platforms
- Ownership of production reliability and operational excellence
- Collaboration with engineering teams to improve platform capabilities
Click on Apply to know more.