DevSecOp Developer | Pan India

DigiHelic Solutions Pvt. Ltd.

Location: India
Job type: Full-time

Required skills

LangChain
Python
penetration testing
PCI-DSS
Airflow
AWS
API
Azure
Bash
capacity planning
cloud infrastructure
CloudFormation
compliance
CUDA
data science
Datadog
DevOps
Docker
GCP
GitHub
GPU
Helm
Hive
incident response
Jenkins
Kubeflow
Kubernetes
microservices
Ray
SRE
TensorFlow
Terraform
Vault
Pytorch
Vertex

About the role

DigiHelic Solutions Pvt. Ltd.

Website: digihelic.com
Job details:

DevSecOp Developer

Experience: 10+ Years

Location: Pan India

Description:

"We are seeking an experienced Senior SRE & DevSecOps Engineers with 10+ years of hands-on experience to design, implement, and maintain secure, scalable, and highly available infrastructure with a strong focus on Cloud & AI/ML platforms. Will be a key contributor in bridging development, security, and operations, ensuring our Cloud/AI systems are resilient, performant, secure, and production-ready.

Key Responsibilities

Site Reliability Engineering

Design, build, and maintain highly available, scalable, and fault-tolerant distributed systems

Define and track SLIs, SLOs, and SLAs; drive reliability improvements based on error budgets

Lead incident response, conduct blameless post-mortems, and implement preventive measures

Build and improve observability stack (monitoring, logging, tracing, alerting)

Automate toil reduction through tooling and self-healing infrastructure

Perform capacity planning and optimize system performance and cost efficiency

Implement chaos engineering practices to proactively identify system weaknesses

AI Security & Governance

Implement AI/ML security best practices including model access controls and API security

Secure model artifacts, training data, and inference endpoints

Set up prompt injection protection and input/output validation for LLM applications

Implement data privacy controls for AI training pipelines (PII detection, data anonymization)

Ensure AI compliance with regulations (EU AI Act, GDPR for AI, industry-specific requirements)

Monitor for adversarial attacks and implement model robustness testing

Implement AI audit trails and model lineage tracking for governance

Manage secrets and API keys for third-party AI services (OpenAI, Anthropic, etc.)

DevSecOps & Security

Embed security into CI/CD pipelines (SAST, DAST, SCA, container scanning, secrets management)

Design and implement infrastructure security controls and hardening standards

Manage vulnerability assessments, penetration testing coordination, and remediation tracking

Implement and maintain IAM policies, RBAC, and zero-trust architecture principles

Ensure compliance with security frameworks (SOC2, ISO 27001, GDPR, HIPAA, PCI-DSS asapplicable)

Conduct security audits and threat modeling for infrastructure and applications

Manage secrets, certificates, and encryption (at rest and in transit)

Infrastructure & Automation

Design and manage cloud infrastructure (AWS/GCP/Azure) using IaC (Terraform, Pulumi,CloudFormation)

Build and maintain container orchestration platforms (Kubernetes, EKS/GKE/AKS)

Develop and maintain CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, ArgoCD)

Implement GitOps practices for infrastructure and application deployment

Automate operational tasks using Python, Go, Bash, or similar languages

AI/ML Infrastructure & MLOps (preferred)

Design, deploy, and maintain scalable AI inference infrastructure (HIVE)

Build and manage AI/ML pipelines

Implement model serving infrastructure (

Manage clusters and optimize resource allocation for training and inference workloads

Implement model versioning, A/B testing, and canary deployments

Set up feature stores and manage data pipelinesMonitor model performance, drift detection, and automated retraining pipelines

Optimize inference latency, throughput, and cost for production AI services

Manage LLM infrastructure including API gateways, rate limiting, and token management

Deploy and scale vector databases (Pinecone, Milvus, Weaviate, pgvector) for RAG applications

Implement LLMOps practices for prompt versioning, evaluation, and deployment\Leadership & Collaboration

Mentor junior engineers and promote SRE/DevSecOps/MLOps best practices across teams

Collaborate with data science, ML engineering, security, and platform teams

Participate in architecture reviews and provide guidance on reliability, security, and AI infrastructure

Document runbooks, architecture decisions, and operational procedures

Drive cultural change toward shared ownership of reliability and security

Evangelize MLOps and AI platform best practices across the organization

Required Qualifications

Experience

10+ years of experience in SRE, DevOps, Platform Engineering, or related roles

6+ years with cloud platforms (AWS, GCP, or Azure) in production environments

5+ years with container orchestration at scale

4+ years integrating security practices into DevOps workflows

2+ years experience with AI/ML infrastructure and MLOps in production

Technical Skills

Cloud & Infrastructure

Cloud Platforms: AWS (preferred), GCP, Azure - expertise in core services

AI/ML Cloud Services: SageMaker, Vertex AI, Azure ML, Bedrock (preferred), or similar

IaC: Terraform (preferred), Pulumi, or CloudFormation

Containers & Orchestration: Docker, Kubernetes, Helm, service mesh (Istio/Linkerd)

AI/ML Platform

MLOps Tools: Kubeflow, MLflow, Airflow, DVC, Weights & Biases

Model Serving: Triton Inference Server, TensorFlow Serving, KServe, Seldon Core, BentoML

GPU Management: NVIDIA GPU Operator, CUDA, multi-GPU training orchestration

Vector Databases: Pinecone, Milvus, Weaviate, Qdrant, pgvector

Feature Stores: Feast, Tecton, or similar

LLM Platforms: OpenAI API, Anthropic, HuggingFace, LangChain, LlamaIndex

CI/CD & Observability

CI/CD: GitHub Actions, GitLab CI, Jenkins, ArgoCD, Spinnaker

Observability: Prometheus, Grafana, Datadog, ELK/OpenSearch, Jaeger, PagerDuty

ML Monitoring: Evidently AI, Arize, WhyLabs, or custom drift detection solutions

Security

Security Tools: Vault, Trivy, Snyk, SonarQube, Falco, OPA/Gatekeeper, AWS Security Hub

AI Security: Guardrails, prompt injection protection, model security scanning

Programming

Scripting/Programming: Python (required), Go, Bash

Familiarity with ML frameworks: PyTorch, TensorFlow (operational knowledge)

Knowledge Areas

Distributed systems design and microservices architecture

AI/ML system design and production ML best practices

Security frameworks and compliance standards

Incident management and on-call best practices

Cost optimization and FinOps principles (including GPU cost optimization)

Preferred Qualifications

Experience with multi-cloud or hybrid AI infrastructure

Hands-on experience with LLM fine-tuning and deployment at scale

Experience with real-time ML inference and low-latency systems

Contributions to open-source projects in the SRE/DevSecOps/MLOps space

Certifications: AWS Solutions Architect/Security, CKA/CKS, AWS ML Specialty, GCP ML Engineer

Experience with distributed training (Horovod, DeepSpeed, Ray)

Familiarity with edge AI deployment and model optimization (quantization, pruning)

Experience with responsible AI practices and bias detection/mitigation

Mandatory skills:

Cloud & AI/ML platforms, SAST, DAST, SCA, container scanning, secrets management, SOC2, ISO 27001, GDPR, HIPAA, PCI-DSS as applicable, Design and manage cloud infrastructure (AWS/GCP/Azure) using IaC (Terraform, Pulumi, CloudFormation),Kubernetes, EKS/GKE/AKS, Python

Desired skills

Cloud & AI/ML platforms, SAST, DAST, SCA, container scanning, secrets management, SOC2, ISO 27001, GDPR, HIPAA, PCI-DSS as applicable,Design and manage cloud infrastructure (AWS/GCP/Azure) using IaC (Terraform, Pulumi, CloudFormation),Kubernetes, EKS/GKE/AKS, Python

Domain (Industry):Banking

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.