Flag job

Report

Senior Site Reliability Engineer - Kubernetes

Min Experience

3 years

Location

Bengaluru, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

The Site Reliability Engineer (SRE) will play a key role in building and managing Kubernetes platforms that support cloud-native applications and services. This position focuses on architecting and managing reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimizing costs and automation. The ideal candidate will have hands-on experience with AWS infrastructure, Kubernetes platform creation, Helm charts, Karpenter scaling, and Istio service mesh. Key Responsibilities: Kubernetes Platform Creation: Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimized for production workloads, providing high resilience and operational efficiency. AWS Infrastructure Management: Build, manage, and optimize AWS cloud infrastructure, including EKS,ECS, S3, VPCs, RDS, IAM, and more. Implement best practices for cost management, scaling, and security within AWS. Helm Management: Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments. Karpenter Implementation: Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands. Istio Service Mesh Management: Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters. Enable fine-grained traffic management, service discovery, and policy enforcement. Platform Automation & Scaling: Automate the deployment, scaling, and management of infrastructure and applications. Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime. Incident Management & Troubleshooting: Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security in a timely and effective manner. Security & Compliance: Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks. Documentation & Knowledge Sharing: Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices. Promote knowledge sharing across teams.

About the company

Okta is The World's Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth. At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we're looking for lifelong learners and people who can make us better with their unique experiences. Join our team! We're building a world where Identity belongs to you.

Skills

kubernetes
helm
karpenter
istio
terraform
ansible
aws
python
bash
go