Flag job

Report

Site Reliability Engineer, Fleet

Salary

$0.1466k - $0.2031k

Min Experience

0 years

Location

remote

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Cisco Meraki, a division of Cisco Networking, is a cloud-managed IT company and leader in cloud-controlled Wi-Fi, routing, and security. Our intuitive platform enables organizations of all sizes to deliver customer and employee experiences at scale. To provide best-in-class technologies to our customers, we've created an unrivaled company culture for our employees. One where diverse backgrounds, perspectives, and experiences shape our work and fuel our evolution. One that is collaborative, flexible, and inclusive and provides employees with the autonomy to develop technology that's accessible and secure for everyone. We are seeking a Site Reliability Engineer (SRE) to join our dynamic SRE Fleet team, which is responsible for ensuring the stability, scalability, and efficiency of our infrastructure. You will play a critical role in maintaining and improving a fleet of over 2000+ machines across a global cloud environment. This role is highly collaborative, involving close interaction with engineering and SRE teams in the UK and San Francisco to scale and optimize our infrastructure. RESPONSIBILITES Develop and maintain automation code for cloud maintenance processes using Ansible and Ruby. Efficiently coordinate and execute large scale maintenance operations acting as a central point between multiple teams Debug and resolve complex failure scenarios across large-scale systems, ensuring high availability and reliability. Design, implement, and optimize GitLab CI pipelines to streamline deployment and testing workflows. Collaborate with engineering teams to identify and address performance bottlenecks and scaling challenges. Proactively troubleshoot issues across the fleet, using a deep understanding of Linux systems and networking. Contribute to the creation of robust unit tests and infrastructure testing suites with RSpec. Participate in collaborative projects to improve infrastructure efficiency, scalability, and observability. Work cross-functionally with teams in different time zones, fostering a culture of shared ownership and reliability. Develop and maintain automated tools for collecting infrastructure data to support compliance requirements. Streamline compliance processes by reducing manual overhead through automation. Be part of an on-call SRE team responding in real time to production incidents YOU ARE AN IDEAL CANDIDATE IF YOU: Experience in: Working in Linux environments across multiple machines, comfortable with bash scripting Scripting / programming languages, specifically around automation. Ideally ruby. CI/CD pipelines, particularly GitLab CI Infrastructure automation, ideally Ansible. Cloud infrastructure providers, ideally AWS Demonstrated experience troubleshooting and debugging in complex distributed systems. Monitoring and alerting, prometheus, grafana etc Experience managing and optimizing fleets of thousands of machines. Excellent collaboration skills and the ability to work effectively across teams in multiple time zones. Passion for automation, scalability, and infrastructure as code.

About the company

Cisco Meraki, a division of Cisco Networking, is a cloud-managed IT company and leader in cloud-controlled Wi-Fi, routing, and security. Our intuitive platform enables organizations of all sizes to deliver customer and employee experiences at scale. To provide best-in-class technologies to our customers, we've created an unrivaled company culture for our employees. One where diverse backgrounds, perspectives, and experiences shape our work and fuel our evolution. One that is collaborative, flexible, and inclusive and provides employees with the autonomy to develop technology that's accessible and secure for everyone.

Skills

linux
bash
ruby
ansible
gitlab-ci
aws
prometheus
grafana