About the role
As a Site Reliability Engineer at SingleStore, you will be responsible for building and maintaining the infrastructure that powers our mission-critical database and analytics platform. You will work closely with our product and engineering teams to ensure high availability, scalability, and performance of our cloud-based services. Your role will involve automating infrastructure provisioning and deployment, monitoring and troubleshooting production systems, and continuously improving our platform's reliability and efficiency.
Required Skills:
- Strong background in Linux system administration and cloud infrastructure
- Experience with configuration management tools like Ansible, Terraform, or Puppet
- Proficiency in at least one programming language (e.g., Python, Go, Bash)
- Understanding of containerization and container orchestration (Docker, Kubernetes)
- Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, ELK)
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration abilities