Website:
aviatrix.ai
Job details:
WHO WE ARE:
For enterprises struggling to secure cloud workloads, Aviatrix® offers a single solution for pervasive cloud security. Where current cybersecurity approaches focus on securing entry points to a trusted space, Aviatrix Cloud Native Security Fabric (CNSF) delivers runtime security and enforcement within the cloud application infrastructure itself – closing gaps between existing solutions and helping organizations regain visibility and control. Aviatrix ensures security, cloud, and networking teams are empowering developer velocity, AI, serverless, and what’s next. For more information, visit www.aviatrix.com.
ABOUT THE ROLE: Senior MTS, Site Reliability Engineering
The Aviatrix SRE team is a small but highly skilled global group of Systems Engineers/SREs dedicated to ensuring the reliability, availability, and performance of Aviatrix’s critical systems and services. Our mission is to build and maintain a robust, resilient infrastructure that enables Aviatrix to deliver high-quality services with agility through automation, best practices, and a culture of operational excellence
As a Senior Member of Technical Staff (Sr MTS) Site Reliability Engineer, you’re a proven mid-level engineer who can work independently with some supervision. You’ll take on more complex technical challenges while building your leadership and mentoring skills.
KEY RESPONSIBILITIES
· Kubernetes – Manage application lifecycles, perform troubleshooting, and implement basic monitoring solutions
· Infrastructure as Code: Design and implement IaC solutions for infrastructure provisioning and configuration management
· Automation & Development: Build automation tools and enhance existing frameworks in Golang and Python
· Reliability Engineering: Design reliability improvements for individual services; implement basic SLI/SLO frameworks
· Automation Excellence: Build automation tools for routine operational tasks; enhance existing automation frameworks
· Observability: Design and implement monitoring for services; create and maintain alerting rules and basic dashboards
· Incident Management: Lead response for moderate severity incidents; conduct basic post-incident reviews
· Performance Engineering: Analyze performance bottlenecks; implement optimization solutions with measurable impact
· Independent Problem-Solving: Solve technically difficult but well-defined problems with minimal guidance
· Collaboration: Represent SRE perspective in cross-team technical discussions; mentor junior team members
· Mentoring: Provide guidance to junior team members
QUALIFICATIONS
· Experience: 3+ years with BS in designated Engineering field, or 0-3+ years with advanced degree
· Technical Skills: Proficiency in Golang and Python with demonstrated problem-solving ability
· Cloud Expertise: Solid experience with cloud platforms and cloud-native technologies
· Infrastructure as Code: Working knowledge of Terraform for infrastructure management
· Kubernetes: Good understanding of Kubernetes concepts and operations
· Monitoring: Experience with monitoring tools (Prometheus, Grafana) and logging solutions
· System Administration: Solid Linux system administration experience
· Communication: Excellent communication skills for cross-team collaboration
Click on Apply to know more.