People Prime Worldwide
Website:
people-prime.com
Job details:
About Client:
Our Client is a global IT services company headquartered in Southborough, Massachusetts, USA. Founded in 1996, with a revenue of $1.8B, with 35,000+ associates worldwide, specializes in digital engineering, and IT services company helping clients modernize their technology infrastructure, adopt cloud and AI solutions, and accelerate innovation. It partners with major firms in banking, healthcare, telecom, and media.
Our Client is known for combining deep industry expertise with agile development practices, enabling scalable and cost-effective digital transformation. The company operates in over 50 locations across more than 25 countries, has delivery centers in Asia, Europe, and North America and is backed by Baring Private Equity Asia.
Job description
Job Title: AIops Engineer
Experience:6-9 Years
Job Type: Contract to Hire
Notice Period: Immediate Joiners
Job Description:
5–8 years in Production Engineering/SRE/DevOps/Platform Ops, with at least 3+ years in AIOps/Observability at scale.
Hands‑on with one or more observability platforms (Datadog/Dynatrace/New Relic/AppDynamics/Splunk/ELK, Prometheus/Grafana).
Strong incident management: event correlation, alert tuning, RCA, on‑call, postmortems.
Proficiency in scripting/automation: Python and Bash (PowerShell optional).
Infra‑as‑Code and config management: Terraform, Ansible (or equivalents).
Experience with containers/Kubernetes (e.g., GKE/EKS/AKS), Linux, reverse proxies, and network fundamentals.
CI/CD exposure (Jenkins, GitHub Actions, GitLab CI), artifact management, and deployment strategies (blue/green, canary).
Practical knowledge of SLO/SLI, error budgets, and resiliency patterns.
Excellent stakeholder communication and ability to influence dev and ops teams.
Preferred (Good to Have)
GCP operations: Cloud Monitoring/Logging/Trace, GKE, Pub/Sub, BigQuery for ops analytics; familiarity with Ops Agent and Managed Prometheus.
Experience with ServiceNow (CMDB, Incident/Problem/Change), PagerDuty/Opsgenie, Runbook Automation tools.
Kafka (or Pub/Sub) for event streaming; Fluentd/Vector/Logstash for log pipelines.
Exposure to AIOps algorithms (anomaly detection, clustering, time‑series forecasting) using libraries like scikit‑learn/Prophet—or vendor-native features.
Chaos engineering (Gremlin/Litmus), load testing (k6/JMeter) for performance and resiliency validation.
Security alignment: SIEM integration (Splunk/Chronicle/Sentinel), secrets mgmt (Vault), and compliance controls.
Certifications: Google Professional Cloud DevOps Engineer, Dynatrace/Datadog/New Relic certs, CKA/CKAD, ITIL Foundation.
Click on Apply to know more.