Website:
dsu.edu.in
Job details:
Position Title: Cybersecurity Operations Lead
Department: Information Technology
Location: DSU – Main Campus
Type: Full‑time
Role Overview
The Cybersecurity Operations Lead is responsible for hardening the organization’s AI/HPC cluster against external and internal threats, managing incident response workflows, and ensuring strong integration with the Cyberbit Cyber Range & SOC platform. This role will design and enforce robust security, monitoring, and isolation strategies across compute, storage, and fabric layers.
The priority focus is implementing multi‑tenant isolation for GaaS (GPU‑as‑a‑Service) commercial clients, ensuring that multiple customers can securely run workloads without risk of cross‑access, data leakage, or privilege escalation.
Key Responsibilities
1. Cluster Hardening & Security Architecture
- Lead the security architecture for the HPC/AI environment, covering compute nodes, GPUs, storage, and IB/Ethernet fabric.
- Implement hardening policies for OS, containers, GPU runtimes, Slurm/K8s schedulers, and cluster access interfaces.
- Enforce zero‑trust principles: least privilege, segmentation, MFA, PAM, and secure credential workflows.
- Conduct regular vulnerability scans, patch cycles, and compliance checks.
2. Multi‑Tenant Isolation (Priority Task)
- Design and implement secure multi‑tenant architecture for GaaS commercial clients, including:
- Network‑level isolation (VPC‑like segmentation, VLANs, fabric policy groups, PKeys)
- Scheduler isolation (Slurm accounts/QoS, cgroup boundaries, GPU partitioning, MIG profiles)
- Storage isolation (namespaces, mount policies, NFS/S3 sub‑path enforcement, RBAC)
- Container/image isolation (per‑tenant registries, runtime restrictions, sandboxing)
- Validate that no cross‑tenant visibility exists across logs, processes, metrics, or filesystem paths.
- Collaborate with compliance teams to ensure contractual security guarantees.
3. Cyberbit Platform Integration
- Integrate cluster logs, telemetry, and events with the Cyberbit platform for SOC visibility.
- Manage alerting pipelines, incident playbooks, threat emulation, and response workflows.
- Develop realistic HPC/AI‑specific attack simulations for Cyberbit training scenarios.
- Ensure full integration with SIEM, SOAR, endpoint agents, and behavioral analytics.
4. Incident Detection & Response
- Lead investigation of security incidents involving compute nodes, GPUs, containers, or networks.
- Coordinate with Cyberbit SOC to analyze alerts and deploy automated response actions.
- Maintain forensics workflows, log retention policies, and secure evidence handling procedures.
- Produce root‑cause analyses (RCAs) and implement long‑term fixes.
5. Security Monitoring & Observability
- Deploy and maintain monitoring for:
- Access control events
- GPU activity, node processes
- Slurm/K8s job behaviors
- Storage access patterns
- Infiniband/RDMA anomalies
- Create dashboards for threat visibility, tenant separation health, and security SLAs.
6. Policy, Compliance & Documentation
- Define cluster security policies, data isolation rules, and operational controls.
- Maintain up‑to‑date SOPs, runbooks, escalation guides, and tenant‑onboarding processes.
- Ensure adherence to relevant standards (ISO 27001, SOC2, internal policies as applicable).
Required Skills & Experience
Technical Skills
- Strong experience securing HPC/AI clusters, cloud environments, or large-scale distributed systems.
- Deep knowledge of:
- Linux hardening (RHEL/Ubuntu)
- Slurm or Kubernetes multi‑tenancy
- GPU isolation (MIG profiles, cgroups, namespaces)
- Storage access control (Lustre/WEKA/NFS/S3 policies)
- Networking isolation (InfiniBand PKeys, VLANs, ACLs, micro‑segmentation)
- Familiarity with container security: Docker, Singularity/Apptainer, registries, SBOM, scanning.
- Experience with SIEM/SOAR integrations—Cyberbit preferred.
- Strong understanding of cryptography, IAM, MFA, PAM, and secret management.
Soft Skills
- Strong communication and coordination across SOC, DevOps, and Infrastructure teams.
- Excellent incident response discipline and documentation quality.
- Ability to create structured security processes for complex multi‑tenant systems.
Qualifications
- Bachelor’s/Master’s degree in Cybersecurity, Computer Science, Engineering, or equivalent field.
- 5–12 years of experience in cybersecurity operations or infrastructure security roles.
- Preferred certifications:
- CISSP / CISM / CEH
- Linux security certifications
- Cloud or Kubernetes security certifications (CKS)
- Cyber Range or SOC certifications (preferred)
Key Performance Indicators (KPIs)
- Successful implementation of multi‑tenant isolation for GaaS clients.
- Zero cross‑tenant data leakage incidents.
- Reduction in critical vulnerabilities across the cluster.
- Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR).
- Quality and completeness of Cyberbit‑driven incident workflows.
- Compliance with internal and external audit requirements.
Date: 20-03-2026
Dr. D. Premachandra Sagar
Pro Chancellor, DSU
Click on Apply to know more.