NovBliss
Website:
novbliss.in
Job details:
Mission: To bridge the gap between development and operations by engineering systems that are self-healing, observable, and secure by design.
Key Responsibilities
Performance Engineering: Analyze VLT/Evo system bottlenecks. It’s not just about measuring speed; it's about optimizing the kernel, network, or application stack to improve it.
Service Level Management: Define, implement, and defend SLIs and SLOs. You will be the guardian of the Error Budget, helping the team decide when to push features vs. when to focus on stability.
Observability Architecture: Design a holistic monitoring strategy using Prometheus/Loki/ELK to move from reactive alerting to predictive signals.
Toil Reduction: Identify repetitive manual tasks and eliminate them through code. If you have to do it twice, automate it.
Technical Requirements
Observability Stack: Deep expertise in Prometheus, Grafana, and ELK/Loki.
Automation: Professional-grade Python or Go (Go is increasingly the SRE standard) and robust Bash scripting.
Infrastructure: Experience with Kubernetes or specialized Data Center orchestration.
Cultural Fit: A Blameless Post-mortem philosophy. You view every outage as a free lesson in system architecture.
Click on Apply to know more.