Azure Cloud DevOps Engineer - Terraform & Kubernetes

UST

Location: Pune Division, Maharashtra, India
Job type: Full-time

Required skills

Azure
cloud infrastructure
communication skills
compliance
DevOps
firewall
Kubernetes
Node
Root Cause Analysis
Terraform

About the role

UST

Website: ust.com
Job details:
Role Description

We are seeking an experienced Azure Platform Engineer with strong expertise in infrastructure platform management, AKS operations, and Infrastructure as Code (IaC). The ideal candidate will lead platform reliability, modernization, and incident management initiatives while mentoring junior engineers and collaborating with global stakeholders.

This role requires deep hands-on technical capability combined with operational ownership and leadership skills.

Key Responsibilities

Infrastructure & Platform Management

Manage and maintain cloud infrastructure platforms, including:

OS and platform patching
Service upgrades and lifecycle management
Certificate lifecycle management

Ensure platform stability, security compliance, and operational excellence.

Azure Cloud & AKS Operations

Design, deploy, and manage Azure cloud environments using Infrastructure as Code (IaC) tools such as Terraform and ARM templates.
Operate and optimize Azure Kubernetes Service (AKS), including:

Cluster upgrades (N-1 strategy)
Node pool management and scaling
Network policies and security enforcement
Azure Firewall integrations
Istio service mesh troubleshooting
Certificate management within Kubernetes

Drive automation and continuous improvement of platform operations.

CI/CD & Automation

Build and maintain CI/CD pipelines for infrastructure and application deployments.
Manage YAML-based pipelines and agent pool governance (legacy and modern setups).
Support image updates, scaling strategies, and pipeline optimization.

Observability & Reliability Engineering

Implement and enhance observability practices using:

Dynatrace monitoring
Prometheus & Grafana
SLO dashboards and performance metrics

Enable routing, service discovery, and automation for high-availability systems.
Ensure proactive monitoring and reliability improvements across environments.

Incident Management & Operational Leadership

Lead high-severity (P1/P2) incident management, including:

Triage and impact analysis
Break-fix resolution
Root Cause Analysis (RCA) documentation
Preventive action planning

Drive operational maturity and continuous service improvement.

Stakeholder Collaboration & Leadership

Collaborate effectively with customers and stakeholders in the US time zone.
Provide clear communication during incidents and change activities.
Lead and mentor junior engineers, fostering technical growth and accountability.

Required Skills & Experience

Strong experience in Azure Cloud services and AKS operations.
Hands-on expertise with Terraform, ARM templates, and Infrastructure as Code practices.
Deep understanding of Kubernetes networking, scaling, and service mesh (Istio).
Experience managing CI/CD pipelines for both infrastructure and applications.
Strong knowledge of observability and monitoring tools (Dynatrace, Prometheus, Grafana).
Proven experience leading high-severity incidents and managing RCAs.
Excellent communication skills and ability to work across global teams.
Prior experience leading or mentoring engineering teams.

Skills

azure devops,cluster upgrades,terraform,infrastructure as code,azure cloud services,aks operations,node pools,ci/cd pipeline,agent pool governance,yaml pipelines,arm,azure firewall,dynatrace monitoring,prometheus Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.