TGS The Global Skills
Website:
theglobalskills.com
Job details:
Job Title: Grafana & Prometheus Specialist / Observability Engineer
Location: Any MP Office
Joining: Immediate Joiners Only (Project starts June 1st)
Experience: 7+ Years
Role Overview:
We are specifically looking for a Grafana & Prometheus expert — not a traditional DevOps Engineer.
This role is focused on Observability Engineering, Reliability Monitoring, and Deep Metrics Intelligence. The ideal candidate should have strong expertise in PromQL, advanced Grafana dashboarding, monitoring architecture, alerting strategies, and transforming raw logs/metrics into actionable operational insights.
Candidates whose experience is primarily around CI/CD, Terraform setup, infrastructure provisioning, or standard cloud DevOps operations will not fit this requirement.
Key Responsibilities:
• Develop, optimize, and troubleshoot complex PromQL queries to extract actionable metrics from Prometheus
• Design advanced Grafana dashboards with dynamic variables, transformations, drill-downs, and multi-source integrations
• Configure Prometheus Service Discovery for auto-scaling and dynamic target discovery
• Build monitoring solutions for large-scale distributed systems and microservices
• Implement proactive alerting strategies using Alertmanager and anomaly detection techniques
• Integrate multiple observability data sources including Prometheus, SQL, Elasticsearch, logs, and cloud metrics
• Create correlated dashboards combining metrics, logs, and traces into a unified operational view
• Support monitoring, incident response, root cause analysis, and reliability engineering initiatives
• Build observability workflows that support self-healing systems and automated triggers
• Collaborate directly with client stakeholders and offshore teams
Must Have Skills:
✅ Expert-level Prometheus & PromQL experience (Mandatory)
✅ Advanced Grafana Dashboard Engineering
✅ Strong understanding of Prometheus architecture, exporters, scraping configs, and recording rules
✅ Experience with Alertmanager, Exporters, and monitoring ecosystems
✅ Service Discovery configuration experience
✅ Multi-source dashboard integration
✅ AWS Cloud knowledge
✅ Large-scale data & log management experience
✅ Strong understanding of Monitoring & Incident Response workflows
✅ Excellent English communication skills
Strongly Preferred:
• Experience with Loki, Tempo, Flux, Elasticsearch
• Experience creating scalable dashboards for hundreds of microservices
• Ability to design dashboards that tell operational stories through data visualization
• Experience with anomaly detection and self-healing monitoring systems
• Observability-first mindset focused on Reliability & Visibility
Important Note:
This is NOT a generic DevOps role.
We are specifically looking for candidates whose core expertise is:
Prometheus
PromQL
Grafana
Monitoring Architecture
Observability Engineering
Reliability Engineering
Do NOT submit candidates focused mainly on:
❌ CI/CD pipelines only
❌ Terraform-heavy infrastructure roles
❌ General cloud administration
❌ Standard DevOps/SRE profiles without deep PromQL & Grafana expertise
Ideal Resume Indicators:
• Strong PromQL projects
• Advanced Grafana dashboards
• Monitoring automation
• Alerting frameworks
• Service Discovery implementations
• Multi-source observability platforms
• Incident response ownership
• Reliability engineering contributions
Click on Apply to know more.