Site Reliability Engineer

Logile

full-time

Required skills

Python
Agile
AWS
Ansible
Azure
Bash
capacity planning
CloudWatch
compliance
DevOps
forecasting
GCP
Helm
incident response
Kubernetes
Linux
Root Cause Analysis
SRE
Terraform

About the role

Logile

Website: logile.com
Job details:
Company Overview

Logile is the leading retail labor planning, workforce management, inventory management and store execution provider deployed in thousands of retail locations across North America, Europe, Australia, and Oceania.

Our proven AI, machine-learning technology and industrial engineering accelerate ROI and enable operational excellence with improved performance and empowered employees. Retailers worldwide rely on Logile solutions to boost profitability and competitive advantage by delivering the best service and products at optimal cost.

From labor standards development and modeling to unified forecasting, storewide scheduling, and time and attendance, to inventory management, task management, food safety, and employee self-service we transform retail operations with a unified store-level solution. Gain the Advantage with The Logic of Retail. One Platform for store planning, scheduling and execution.

For more information, visit www.logile.com

Job Summary

We are seeking a motivated and experienced

Site Reliability Engineer

( SRE)

to join our dynamic engineering team. The ideal candidate will have a strong background to ensure the reliability, scalability, and performance of our infrastructure and applications. The SRE will focus on building robust monitoring systems, automating operations, and bridging the gap between development and operations to achieve high service availability.

Key Responsibilities

Design, implement, and manage observability systems (Prometheus, Grafana, ELK/EFK, Jaeger, Open Telemetry).

Define and maintain SLAs, SLOs, and SLIs for services, ensuring reliability goals are met.

Build automation for infrastructure, monitoring, scaling, and incident response using Terraform, Ansible, and scripting (Python/Bash).

Collaborate with developers to design resilient and scalable systems following SRE best practices.

Lead incident management: monitoring alerts, root cause analysis, postmortems, and continuous improvement.

Implement chaos engineering and fault-tolerance testing to validate system resilience.

Drive capacity planning, performance tuning, and cost optimization across environments.

Ensure security, compliance, and governance in infrastructure monitoring

Job Location & Schedule:

This job is an onsite job at Logile Bhubaneswar Office.

It is expected that the selected candidate will be available to work with some hours of overlap with US working times

Required Skills & Experience

2 -5 years, Strong experience with monitoring, logging, and tracing tools (Prometheus, Grafana, ELK, EFK, Jaeger, Open Telemetry, Loki).

Cloud expertise: AWS, Azure, or GCP monitoring and reliability practices (CloudWatch, Azure Monitor).

Proficiency in Linux system administration and networking fundamentals.

Solid skills in infrastructure automation (Terraform, Ansible, Helm).

Programming/scripting skills: Python, Go, Bash.

Experience with Kubernetes and containerized workloads.

Proven track record in CI/CD and DevOps practices.

Preferred Skills

Experience with chaos engineering tools

(Gremlin, Litmus).

Strong collaboration skills to drive SRE culture across Dev & Ops teams.

Experience with Agile/Scrum environments.

Knowledge of security best practices (DevSecOps).

Compensation And Benefits

The compensation and benefits associated for this role is benchmarked against the best in industry and job location

Standard shift: 1 PM

10 PM (shift allowance applicable for non-standard shifts and as per role).

Shifts starting after 4 PM: Eligible for food allowance/subsidized meals and cab drop.

Shifts starting after 8 PM: Eligible for cab pickup as well. Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.