Mumba Technologies, Inc.
Website:
mumbatech.com
Job details:
Job Title: Technical Lead- SRE
Job Type: Full Time
Location: Remote
About the Job:
What You Will Do
Technical Leadership & Architecture
- Own and drive the technical direction for your team's infrastructure systems, making architectural decisions that balance reliability, scalability, and cost.
- Design systems of moderate to high complexity using distributed systems best practices; anticipate future use cases and minimize technical debt.
- Conduct architectural reviews and advance design patterns across the organization.
- Identify and implement improvements to existing software architecture; define and expand design patterns to solve common platform problems.
- Define and enforce security best practices across team-owned systems; proactively surface gaps to senior leadership.
Reliability & Operational Excellence
- Own the reliability posture of team-owned services — establish SLOs, monitor SLAs, and hold the team accountable to them.
- Lead incident response for complex, multi-service issues; systematically debug, identify root causes, and ensure issues do not recur.
- Establish standards for logging, monitoring, and operationalization across all team-owned systems.
- Foresee potential operational issues and implement preventative measures to safeguard the customer experience.
- Participate in and help lead the on-call rotation; ensure production systems are appropriately instrumented.
Project & Delivery Ownership
- Act as DRI (Directly Responsible Individual) for medium-to-large SRE projects spanning months and involving cross-team collaboration.
- Partner with Engineering Managers and Product Managers to scope roadmap initiatives, break down work into actionable increments, and commit to delivery plans.
- Negotiate scope effectively when required, ensuring adjustments align with customer needs and project goals.
- Proactively identify and resolve project risks — dependencies, architectural drift, and staffing blockers — before they impact delivery.
AI-Augmented Engineering
- Demonstrate mastery of AI-driven development practices and integrate them into end-to-end feature and infrastructure delivery.
- Contribute improvements to internal AI prompt libraries, coding workflows, and AI usage best practices for the team.
- Use AI tools to accelerate creation of technical documents, design proposals, runbooks, and exploration of alternative solutions.
- Stay current with emerging AI development patterns and bring relevant innovations back to the team.
- Coach teammates on responsible, efficient, and effective use of AI tools (e.g., Cursor, Augment) across the software development lifecycle.
What We Are Looking For
Required Experience
- 7+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering in a production cloud environment.
- 5+ years of hands-on experience with AWS cloud services across compute, networking, storage, and security.
- 5+ years managing Linux-oriented production environments at scale.
- 5+ years using Infrastructure-as-Code (Terraform, CDK, CloudFormation) and/or GitOps best practices.
- 3+ years operating and troubleshooting production Kubernetes environments.
- 3+ years applying AWS Well-Architected Framework principles across reliability, security, performance, and cost pillars.
- 3+ years in cloud security best practices including IAM, secrets management, network security, and compliance.
- 3+ years working with PostgreSQL in production: performance tuning, replication, backup, and recovery.
- Demonstrated track record of leading multi-person technical projects from scoping through delivery.
Technical Skills
- Strong general programming skills; comfort writing automation scripts and tooling in Python, Go, or similar.
- Deep knowledge of observability tooling — metrics, logging, distributed tracing — and how to use them to drive reliability.
- Solid understanding of data retention, backup, and recovery processes across cloud-native systems.
- Experience with CI/CD pipelines, release management, and deployment automation.
- Familiarity with service mesh, API gateway patterns, and microservices architectures.
AI Fluency
- Proficient with agentic coding assistants (e.g., Cursor, Augment, GitHub Copilot) for day-to-day engineering tasks.
- Able to use AI to break down complex infrastructure tasks, accelerate design documentation, and improve code review quality.
- Ability to critically evaluate AI-generated outputs and identify when outputs are suboptimal or unsafe.
Leadership & Collaboration
- Proven ability to lead technical discussions, drive alignment across engineering and product, and communicate decisions clearly to stakeholders.
- Experience mentoring junior and mid-level engineers in both technical skills and professional development.
- Able to operate independently with minimal supervision; comfortable making final technical decisions as DRI.
- Strong communication skills in English — written and verbal — with experience influencing cross-functional partners.
Click on Apply to know more.