About this opportunity
BNEW RCE Lab Operations provides lab infrastructure, networks, and hardware services that enable R&D teams to configure and deploy design and test environments globally.
Hardware Management (HWM) provides hardware asset management for R&D, covering the full asset lifecycle to support data-driven decisions and maximize value.
Join Ericsson as a DevSecOps Engineer and help build, secure, and operate COS (Central Observability System), the platform behind our global observability services used by thousands of developers and testers worldwide.
In this role, you’ll drive secure automation, reliable operations, and scalable delivery end-to-end (CI/CD, infrastructure/configuration as code, operational readiness, and continuous improvement).
You’ll work in a mid-sized team with the product owner, architects, operations manager, and end users. We use Agile/Scrum and are scaling toward SAFe to keep delivery fast, secure, and predictable.
Tech stack: cloud-native microservices (e.g., Go, Svelte, Kubernetes), telemetry backends (e.g., Cortex), and data systems (e.g., Postgres/Cassandra/Kafka).
You enjoy automation, clear guardrails, and measurable outcomes as well as turn telemetry into action without compromising security or reliability.
What you will do
- Build and run COS as a secure, reliable cloud-native platform (containers, Kubernetes, OpenStack where applicable), using infrastructure/configuration as code.
- Harden and improve delivery pipelines with security-by-design: automate testing and scanning, enforce policy checks, and support-controlled releases.
- Execute deployments, upgrades, and configuration changes; troubleshoot by reproducing issues, restoring service, and running performance/load tests where needed.
- Improve secure CI/CD for platform and services: testing, scanning, policy checks, controlled releases, and compliance activities (risk assessments, audits, secure configuration baselines).
- Apply SRE practices to improve availability, scalability, and performance, and drive proactive monitoring and reliability improvements.
- Take operational ownership with the team: runbooks, on-call readiness, incident/problem management, SLA follow-up, access management, and service performance reporting.