About the role
Salesforce hosts web services and applications written by thousands of internal developers and tens of thousands of customers to provide the largest business automation cloud on the planet. The underlying infrastructure that enables this innovation and value is evolving to fully embrace lights-out operations, single-click deploy to tens of thousands of nodes, and services that self-heal and self-optimize. The platform is a multi-substrate Kubernetes and microservices platform including k8s, service mesh and ingress which powers Core CRM and a growing set of applications across Salesforce.
We are seeking a Site Reliability Engineer / DevOps Engineer to join our team and help build, and operate the next-generation Microservices Platform leveraging Service Mesh, Ingress Gateway load balancing. Our goal is to transform our software stack by adopting more cloud-native and AI-driven operational practices to build a highly reliable, self-healing, and scalable service mesh.
In this role,
You are responsible for the high availability for the microservices supporting service mesh and ingress gateway on a large fleet of 1000+ clusters running various technologies like Kubernetes, Docker, network load balancers, service mesh, Istio and so on. You'll gain valuable experience troubleshooting real production issues which will expand your knowledge of the architecture.
You will contribute code to drive availability improvement for services.
You will help improve the platform's visibility by implementing necessary monitoring and metrics with Prometheus, Grafana and other monitoring frameworks.
You will drive automation efforts in Python/Golang/Puppet/Jenkins to eliminate manual work with day to day operations.
You will drive improvements to CI/CD pipelines built on Terraform, Spinnaker and Argo
You'll implement AIOps automation, monitoring and self-healing mechanisms to proactively fix issues to reduce MTTR and Operational Toil.
You will get a chance to improve your communication and collaboration skills working with various other Infrastructure teams across Salesforce.
You will interact with a highly innovative and creative team of developers and architects.
You will evaluate new technologies to solve problems as needed.
About the company
Salesforce is the global leader in customer relationship management (CRM), bringing companies and customers together in the digital age. Founded in 1999, Salesforce enables companies of every size and industry to digitally transform and create a 360° view of their customers.