About the role
Experience
Minimum 5 years of relevant work experience with observability particularly Grafana and Prometheus set up in critical production environments.
Writing custom exporters and integrations.
Has experience working with Openshift private cloud infrastructure and hosted applications.
Experience with Managed Grafana on public cloud environments is beneficial.
Multi-tenancy setup and data segregation on the observability and AIOps stack.
Defining SLIs and setting up SLOs for multi-tenant solutions.
Core Capabilities
Experience in implementing Container, Network, APM, RUM, Log Analytics, end-to-end tracing, and custom alerts with Grafana, Prometheus, Grafana Loki alternatively Logstash or Fluent bit.
Openshift proficiency with containers and multi-tenancy setup for the observability solution is critical.
Ability to configure custom alerts, monitors and build AIOps workflows based on telemetry.
Good understanding of setting up integration capabilities with other systems via APIs and consuming external APIs for IAM as well as ingesting metric-based telemetry via collectors.
Ability to build custom Grafana dashboards.
Setting up Synthetic Monitoring and Test Automation while integrating its telemetry into the observability stack.
Tenant and data segregation.
Ability to code is mandatory Python / Java and Ansible scripting preferred.