• Own the operational reliability, performance and resilience of the Global Fabric NaaS platform.
• Help and troubleshoot microservices, APIs and integrations across the NaaS ecosystem.
• Diagnose and resolve production issues across Kubernetes-hosted applications, Linux systems, networking, Kafka, APIs and service integrations.
• Help safe, automated change into production using CI/CD, GitOps, and automated testing.
• Improve observability, monitoring and traceability across the platform using Dynatrace, Prometheus, Grafana, Elasticsearch and Kafka.
• Help BT’s move towards end-to-end tracing and service traceability, helping implement and improve synthetic monitoring, tracing and service flow visibility.
• Participate in major incident resolution, root cause analysis and post-incident improvement activities.
• Manage incidents, problems and changes through ServiceNow and track defects and improvements in Jira.
• Drive automation through Ansible, Python, Bash or similar tooling to reduce manual effort and operational risk.
• Mentor and help L2 engineers by improving troubleshooting practices, runbooks and operational readiness.
• Build strong knowledge of the end-to-end customer journey and ensure operational decisions are aligned to customer impact.