Lead Platform Engineer (DevOps)

RoboMQ

full-time

Required skills

Python
penetration testing
AWS
cloud infrastructure
CloudWatch
communication skills
compliance
DevOps
Docker
end-to-end
Git
Helm
incident response
Java
Jenkins
Kafka
Kibana
Kubernetes
RabbitMQ
Root Cause Analysis
SaaS
Shell Scripting
SRE
SSL
Terraform
version control
VPC

About the role

RoboMQ

Website: robomq.io
Job details:

About the Company:

RoboMQ is a fast-growing SaaS company delivering disruptive Identity Governance and Administration (IGA) solutions to mid-market enterprise customers. Our flagship product, Hire2Retire, automates the employee identity lifecycle by integrating HR systems with Identity Management and other applications helping organizations achieve seamless onboarding, offboarding, compliance, and security with zero-trust and least-privilege security posture.

Before you apply, make sure you have:

6+ years’ experience working in a DevOps, Platform Engineer or Site Reliability Engineer Role.
B. Tech degree with relevant technical experience.
Demonstrated ability to lead technical squads, manage incident response, and oversee a robust on-call support framework to handle critical infrastructure issues.
Ability to drive architectural resilience, fast-track cloud transformations, and scale infrastructure alongside our rapidly evolving product and business.
Exceptional verbal and written communication skills with strong documentation and mentoring capabilities.
Extensive experience architecting and managing production-grade distributed systems.

Responsibilities

Architect, scale, and maintain multiple production-grade, multi-node Kubernetes clusters for high availability, optimum performance, and cost optimization.
Standardize, design, and optimize enterprise-wide logging, monitoring, and alerting using tools like Prometheus, Grafana, EFK, or CloudWatch.
Design, implement, manage, and secure production-level CI/CD pipelines for seamless, automated deployments.
Own and architect the cloud infrastructure hosted on AWS to keep it secure, scalable, and highly optimized.
Standardize and automate infrastructure provisioning, scaling, and security compliance across all environments on AWS through advanced, modular Terraform templates.
Strengthen enterprise cloud security through advanced IAM policies, end-to-end encryption, and automated vulnerability scans.
Formulate post-mortem frameworks, lead root cause analysis (RCA), troubleshoot systemic issues, and continuously drive infrastructure improvements.
Work with Penetration testing tools like NMAP and OWASP ZAP to analyze, mitigate, and improve network and application security.
Strengthen overall security strategy including infrastructure security, webapp security, network topologies, and IAM security.
Guide, code-review, and mentor junior DevOps engineers to align with industry best practices and internal performance standards.

Key Skills [Must have]

Strong hands-on experience with production cluster administration, Docker, and Kubernetes.
Strong understanding of Git, branching strategies, and enterprise version control.
CI /CD: Jenkins, GitHub, GitHub Actions, and automated code quality gating via SonarQube.
Infrastructure as Code: Advanced, modular experience with Terraform and application packaging using Helm charts (kOps).
Deep experience architecting, deploying, and managing cloud-based applications, preferably on AWS.
Cloud Networking & Security fundamentals (IAM, firewalls, VPC, SSL, encryption).
End-to-end production setup of Monitoring & Observability tools (Prometheus, Grafana, Alert Manager).
Production-grade management of Logging Architectures (Elastic Search, FluentD, Kibana - EFK Stack).
Infrastructure support for high-throughput distributed systems and message queues (Kafka, RabbitMQ, or AWS SQS).
Excellent knowledge of shell scripting along with proficiency in an engineering programming language (Python, Go, or Java).
Cyber Security: OWASP Top 10, NMAP, ZAP.

Additional Skills [Good to have]

Service Mesh & API Gateways: Istio, Kong.
Deep familiarity and structural implementation background with SRE (Site Reliability Engineering) practices.
Exposure to automated infrastructure compliance, zero-trust policies, and corporate data security standards (such as SOC2 or ISO 27001).

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.