About the role
At SmartBear, we deliver the complete visibility developers need to make each release better than the last. Our award winning and industry favorite tools TestComplete, Swagger, Cucumber, ReadyAPI, Zephyr are trusted by over 16 million developers, testers, and software engineers at 32,000+ organizations – including world-renowned innovators like Adobe, JetBlue, FedEx, and Microsoft.
You will build and maintain key infrastructure that is observable, stable, and performant.
You will have the opportunity to work with the latest industry-leading technologies, architectures, languages, data storage, and messaging frameworks; as well as being empowered to explore and contribute your own ideas.
You will work with a modern, highly scalable microservices architecture that delivers Application error monitoring, Real User Monitoring (RUM), and observability solutions.
We're looking for a Site Reliability Engineer to join the BugSnag Infrastructure team. You will be working alongside a small, talented team based predominantly in Bath, UK. Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. We use our engineering skills and knowledge to build tools and processes to keep the BugSnag product online 24/7, making our systems self-heal wherever possible to limit the impact of being on-call.
Ensure that all our microservices, databases and other key infrastructure are observable, introducing monitoring in the right places to help us have visibility into what is happening in our production systems.
Influence the design of new and existing microservices to ensure they will be observable, stable and performant.
Maintain our databases and help to keep them performant by making changes to their configuration and deployment, as well as looking into the system's usage of them.
Automate tasks to keep the BugSnag systems resilient with minimal manual intervention.
Validate our infrastructure to ensure it is highly available and recoverable.
Work with our security team to implement best practice security measures to protect our infrastructure and customer data.
Make sure that our On-Premise product is as reliable as our SaaS product.
Review the work of others in the team to a high standard to reduce the risk of production issues.
Regularly be on-call to deal with production issues and keep BugSnag running 24/7.
We are looking for you if you have:
Substantial engineering experience.
Professional experience developing software.
Experience working with and maintaining a Linux or other *nix flavour system.
Hands on experience with one of the major cloud platforms: GCP, AWS or Azure.
Professional experience querying, deploying, and maintaining databases/datastores (we use MongoDB, Redis, Elasticsearch, ClickHouse and Gluster).
You are:
Quick to learn new skills, and can readily apply existing skills and knowledge to solve new, complex problems.
Able to take ownership of all stages of a project from architecture/design through implementation to delivery.
Able to effectively balance idealism with pragmatism when assessing the direction a project should take.
Willing to go the extra mile to make other developers more efficient.
You may also have:
Experience with Docker/Kubernetes.
Experience with Terraform/Packer/Vagrant.
Experience with Chef/Puppet/Ansible.
Experience with MongoDB/Redis/Elasticsearch/ClickHouse.
Experience with Gluster.
Experience with RabbitMQ/Kafka.
Experience with microservices in production.
About the company
SmartBear is committed to ethical corporate practices and social responsibility, promoting good in all the communities we serve. SmartBear is headquartered in Somerville, MA with offices across the world including Galway Ireland, Bath, UK, Wroclaw, Poland and Bangalore, India.