PwC
Website:
pwc.com
Job details:
At PwC, our people in infrastructure focus on designing and implementing robust, secure IT systems that support business operations. They enable the smooth functioning of networks, servers, and data centres to optimise performance and minimise downtime. Those in cloud operations at PwC will focus on managing and optimising cloud infrastructure and services to enable seamless operations and high availability for clients. You will be responsible for monitoring, troubleshooting, and implementing industry leading practices for cloud-based systems.
Focused on relationships, you are building meaningful client connections, and learning how to manage and inspire others. Navigating increasingly complex situations, you are growing your personal brand, deepening technical expertise and awareness of your strengths. You are expected to anticipate the needs of your teams and clients, and to deliver quality. Embracing increased ambiguity, you are comfortable when the path forward isn’t clear, you ask questions, and you use these moments as opportunities to grow.
Skills
Examples of the skills, knowledge, and experiences you need to lead and deliver value at this level include but are not limited to:
- Respond effectively to the diverse perspectives, needs, and feelings of others.
- Use a broad range of tools, methodologies and techniques to generate new ideas and solve problems.
- Use critical thinking to break down complex concepts.
- Understand the broader objectives of your project or role and how your work fits into the overall strategy.
- Develop a deeper understanding of the business context and how it is changing.
- Use reflection to develop self awareness, enhance strengths and address development areas.
- Interpret data to inform insights and recommendations.
- Uphold and reinforce professional and technical standards (e.g. refer to specific PwC tax and audit guidance), the Firm's code of conduct, and independence requirements.
Site Reliability Engineer (SRE)
We are seeking a highly skilled
Site Reliability Engineer (SRE) with expertise in
Windows, Linux, Databases, Networking, and Voice/Unified Communications to ensure the reliability, availability, and performance of enterprise infrastructure and services. This role bridges the gap between operations and engineering, applying software engineering principles to infrastructure problems and driving a culture of automation, observability, and continuous improvement across heterogeneous environments.
Responsibilities
Windows Infrastructure Administration
- Design and plan the upgrade for Windows Server environments (2016/2019/2022), ensuring high availability, reliability, and performance
- Automate routine administration tasks using PowerShell scripting and configuration management tools (Ansible, SCCM)
- Ensure OS-level security hardening, compliance, and access control enforcement
Linux Infrastructure Administration
- Design and plan the upgrade for Linux environments (RHEL, CentOS, Ubuntu) across on-premises and cloud platforms
- Automate infrastructure tasks using Bash, Python scripting, and tools such as Ansible or Puppet
- Enforce security baselines, SELinux/AppArmor policies, and patch compliance
Database Administration & Reliability
- Monitor database performance, optimize queries, and manage indexing strategies to meet SLOs
- Collaborate with DBAs and application teams to resolve database-related incidents and capacity concerns
- Ensure database security, access controls, and audit compliance across all platforms
Network Operations & Reliability
- Monitor and maintain network infrastructure including routers, switches, firewalls, and load balancers
- Troubleshoot network incidents affecting availability, latency, and throughput across LAN/WAN/SD-WAN environments
- Collaborate with network engineering teams on capacity planning, topology changes, and cloud network integration
Voice & Unified Communications Administration
- Manage and maintain Voice and Unified Communications platforms including Cisco CUCM, CUBE, Unity Connection, or equivalent
- Monitor call quality, diagnose VoIP issues, and ensure SLA compliance for voice services
- Perform patching, upgrades, and configuration management of voice infrastructure components
- Coordinate with telecom vendors and carriers for circuit management and issue resolution
Cross-Functional Responsibilities
- Define and track Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets across infrastructure domains
- Lead incident response, root cause analysis, and post-mortem reviews to drive systemic improvements
- Build and maintain monitoring, alerting, and observability frameworks using tools such as Grafana, Prometheus, Datadog, or Splunk
- Collaborate with application, cloud, and security teams to support platform reliability and modernization programs
- Drive automation initiatives to eliminate toil and improve operational efficiency across all managed domains
- Provide Tier 2/3 support for critical infrastructure incidents
Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or related technology field preferred
- Minimum of 5 years of hands-on experience across infrastructure domains (Windows, Linux, Network, Database, or Voice)
- Proven experience with SRE principles including SLOs, error budgets, incident management, and blameless post-mortems
- Strong scripting and automation skills in Python, Bash, or PowerShell
- Experience with monitoring and observability platforms (Grafana, Prometheus, Datadog, Splunk, or equivalent)
- Strong working knowledge of ITIL principles and ITSM practices
- Familiarity with cloud platforms (Azure) and hybrid infrastructure environments
- Current understanding of industry trends, reliability engineering methodologies, and DevOps practices
- Outstanding verbal and written communication skills
- Excellent attention to detail with strong analytical and problem-solving capabilities
- Strong interpersonal skills and ability to collaborate across technical and non-technical teams
Click on Apply to know more.