PwC
Website:
pwc.com
Job details:
At PwC, our people in infrastructure focus on designing and implementing robust, secure IT systems that support business operations. They enable the smooth functioning of networks, servers, and data centres to optimise performance and minimise downtime. Those in cloud operations at PwC will focus on managing and optimising cloud infrastructure and services to enable seamless operations and high availability for clients. You will be responsible for monitoring, troubleshooting, and implementing industry leading practices for cloud-based systems.
Driven by curiosity, you are a reliable, contributing member of a team. In our fast-paced environment, you are expected to adapt to working with a variety of clients and team members, each presenting varying challenges and scope. Every experience is an opportunity to learn and grow. You are expected to take ownership and consistently deliver quality work that drives value for our clients and success as a team. As you navigate through the Firm, you build a brand for yourself, opening doors to more opportunities.
Skills
Examples of the skills, knowledge, and experiences you need to lead and deliver value at this level include but are not limited to:
- Apply a learning mindset and take ownership for your own development.
- Appreciate diverse perspectives, needs, and feelings of others.
- Adopt habits to sustain high performance and develop your potential.
- Actively listen, ask questions to check understanding, and clearly express ideas.
- Seek, reflect, act on, and give feedback.
- Gather information from a range of sources to analyse facts and discern patterns.
- Commit to understanding how the business works and building commercial awareness.
- Learn and apply professional and technical standards (e.g. refer to specific PwC tax and audit guidance), uphold the Firm's code of conduct and independence requirements.
Instructions
Please update areas marked in red
Link to Tips & Tricks for Writing PwC Job Description
- Quick Tips for Reviewing your JD!
- Make sure you have the appropriate header sentence based on the level of the JD (i.e. Manager level role should start with appropriate descriptor “Demonstrates extensive abilities and/or a proven record of success as a team leader:” The appropriate header can be found in the Tips and Tricks document provided above.
- Be mindful of grammatical consistency. the list should either be all verb-driven or all noun-driven (but not both).
- When listing requirements under the required or preferred skills section, each sentence should end in a semi-colon (.) except for the last bullet which should end with a period (.)
Job Profile Name
Child Name
Global LoS
Global Network
Global Competency Network
Go-To-Market
Managed Services
Sector
Not Applicable
Programme Type
Experienced
Additional Responsibilities: (This field may be used to describe the daily role, duties and/or purpose of this Job Profile/Job Description. The field is limited to 500 characters, including spaces.)
Leads reliability improvements across applications, platforms, and cloud systems. Drives automation, enhances observability, optimizes performance, and conducts root-cause analysis. Partners with engineering teams to reduce toil, improve operational maturity, and strengthen service resilience.
Minimum Degree Required: Bachelors
Degree Preferred: Bachelors or master’s in science, Computer Science, Engineering
Minimum Years of Experience: 5-7 year(s)
Certifications Required: None
Certifications Preferred: AWS Solutions Architect Associate; Azure Administrator; Kubernetes CKA; Terraform Associate; ITIL Foundation, Observability certifications, Scripting and Coding Certifications will be great as well.
Required / Mandatory Knowledge/Skills: (character count limit 5000)
*PLEASE ONLY USE THIS FIELD IF THIS IS A MUST HAVE SKILL FOR APPLICANT*
- Strong understanding of SRE practices including SLIs/SLOs, error budgets, service health, and operational KPIs
- Ability to automate operational tasks using Python, Shell, PowerShell, Go, or similar languages
- Experience improving alerting systems, reducing noise, and refining observability instrumentation
- Proficiency with cloud platforms and core services (compute, storage, networking, serverless)
- Experience executing root-cause analysis and problem management
- Ability to lead incident response and coordinate cross-team troubleshooting
- Experience identifying systemic reliability gaps and proposing engineering solutions
- Ability to design performance tests, validate reliability risks, and assess scalability
- Strong communication skills for partnering with development, operations, and leadership
Preferred Knowledge/Skills: (character count limit 5000)*
PLEASE MAKE THIS A BULLETED LIST WHERE EACH SENTENCE STARTS WITH THE SAME VERB TENSE (I.E. PROVIDES, DEVELOPS, FACILITATES, ETC.)
- Leads tuning of monitoring rules, dashboards, and reliability metrics;
- Leads development of automation to reduce operational toil and manual interventions;
- Leads incident response actions and service stabilization procedures;
- Leads post-incident reviews and contributes to long-term fixes;
- Leads resilience initiatives such as chaos testing and failover drills;
- Leads capacity forecasting and risk identification;
- Leads refinement of operational standards, documentation, and runbooks;
- Leads collaboration with product and engineering teams to embed reliability requirements.
Click on Apply to know more.