NielsenIQ
Website:
niq.com
Job details:
Qualifications
L2 Operations (Run)
* Monitor platform health (warehouses, query queues, credits, storage, load/unload jobs) and respond to alerts within SLAs.
* Triage, resolve, and document incidents including warehouse queueing, failed tasks/replications, user access issues, timeouts, locking/contention, and connector failures.
* Execute standard runbooks: warehouse scaling, resume/suspend, failover/replication checks, credential/token rotation, and service user maintenance.
* Provision and maintain users, roles, warehouses, databases, schemas; manage future grants; perform regular storage hygiene and cleanup.
* Conduct operational reviews such as daily checks, weekly capacity and credit reports, cost/usage anomaly investigation, and storage growth tracking.
L3 Operations (Engineering & Problem Management)
* Performance engineering including deep‑dive query profiling, partitioning/pruning strategies, micro‑partition optimization, leveraging result cache, and warehouse right‑sizing.
* Queue management and automation by analyzing FE warehouse patterns and implementing autoscaling, auto‑suspend/resume policies, and scheduled tasks to reduce load spikes.
* Security and compliance activities including AD/SSO configuration, token/PAT implementations, network and authentication policies, SCIM provisioning, key rotation, RBAC reviews, and segregation‑of‑duties controls.
* Replication and disaster recovery setup including cross‑account and cross‑region replication, weekly validation, documenting RTO/RPO, and performing DR drills.
* Integration management including Security, Storage, and SCIM integrations, coordination with monitoring platforms such as Datadog, and evaluation of tools like Trust Center, WIZ, Cyera, and Coralogix.
* SRE practices such as maintaining runbooks, completing post‑incident reviews, managing problem records, defining SLIs/SLOs (availability, queue wait times, P95 latency, failed jobs), and driving remediation.
* Environment management including POC setups, sandbox environments, new account creation within the Org hierarchy, baseline hardening, and implementing cost and resource governance.
* Automation and IaC including scripts and templates (Terraform/Tofu, Snowflake CLI/SnowSQL, Python) for provisioning, grants management, and monitoring automation.
Click on Apply to know more.