Flag job

Report

Site Reliability Engineer

Location

remote

About the role

About the Role The Site Reliability Engineers (SREs) at Chatbooks design, implement, and maintain distributed, fault-tolerant systems. They learn quickly and apply sound engineering principles to existing and new problems. Much of the work focuses on improving and optimizing existing systems, building infrastructure, and eliminating work through automation. What You’ll Do Implement and improve observability, incident response, and monitoring practices that lead to greater operational excellence. Take ownership of uptime, reliability, and performance of core systems. Collaborate across departments responding to and resolving issues. Leverage a combination of commercial software and services, open-source products, and cloud offerings to manage applications, infrastructure, and services with a customer-focused mindset. Skills and experience you’ll need to succeed: Clear written and verbal communication skills. Experience working in one of the large public cloud environments (AWS, Azure, or GCP). Knowledge of distributed systems, including load balancing and data replication. Proficiency with logging and monitoring tools, such as Prometheus, Grafana and Elastic APM. Familiarity with containerization technologies, such as Docker and Kubernetes. Background in managing web applications, backend services, and system architecture. Using infrastructure automation tools for deployments like Terraform or CloudFormation. Possess a growth mindset, including advancing your skills—a fast learner of new concepts and technologies. Strong understanding of continuous integration and continuous deployment (CI/CD) pipelines. Prior hands-on software development experience.

About the company

At Chatbooks, we think photos deserve to be seen, not trapped on your phone— which is why we created our user-friendly photo book app! Users make the memories and we make the photo books. By focusing on four core values—beyond easy, super affordable, great quality, and amazing service—we continue to keep Chatbookers everywhere excited about printing their everyday magic.

Skills

AWS
Azure
GCP
distributed systems
load balancing
data replication
Prometheus
Grafana
Elastic APM
Docker
Kubernetes
Terraform
CloudFormation
CI/CD