Website:
ltm.com
Job details:
Role description
Job Title Infrastructure Operations Observability Engineer
The Role:We are looking for a proactive Engineer to manage our hybrid global infrastructure GCP VMware Proxmox and crucially to build and maintain our observability layer You will ensure that every VM container and Vault instance is not just running but is visible metered and ing correctly
You will be the person who ensures that when a service in Tokyo lags the team in London sees it on a dashboard before a customer reports it
Expanded Responsibilities The Observability Mission
Monitoring Setup Deploy and maintain agents like Prometheus Node Exporter Google Cloud.
Ops Agent or Telegraf across Linux VMs in GCP and Proxmox.
The Integration Glue Use Python to sync metadata between GCPProxmox and our observability tools Uptimecom PagerDuty and GrafanaGCP Monitoring.
Dashboard Crafting Create clear actionable dashboards that show the health of applications.such as HashiCorp VaultOpenBao system resources and application uptime.
Hygiene Tune ing thresholds to ensure PagerDuty only fires for real issues reducing noise for the global team.
Log Aggregation Ensure logs from global servers are flowing correctly into a central location like GCP Cloud Logging or the corporate SIEM solution for troubleshooting.
Proxy Management Deploy and tune Nginx as both a reverse proxy handling incoming traffic to VaultOpenBao and a forward proxy controlling egress from our private nodes.
Infrastructure as Code Fluent in uses of IaC technologies and GitLab repositories to drive change and operations.
Web Server Hardening Manage SSLTLS certificates Lets EncryptACME and ensure proxy headers are configured for security and performance
Technical Skills Checklist
1 Observability Operational Monitoring The Priority Synthetic Monitoring Handson experience with creating tools scripts for global health checks.
Incident Management Proficiency in PagerDuty setting up services integrations and grouping.
Metrics Visualisation Familiarity with PrometheusGrafana or GCP Monitoring Stackdriver
Understanding the difference between a Metric how much and a Log what happened
Health Checks Ability to write custom healthcheck endpoints or scripts to verify service integrity
2 Hybrid Infrastructure GCP Proxmox
Proxmox VE Managing VMLXC lifecycles snapshots and basic cluster health
GCP Compute Engine GKE and VPC networking
Linux Advanced CLI skills for performance debugging htop iostat netstat journalctl
3 Automation Security
Python Essential for observabilityascodewriting scripts to automate the creation
of monitors or s via API
VaultOpenBao Maintaining the Observer roleensuring the monitoring tools have the
correct limited permissions to check Vault health
Click on Apply to know more.