Flag job

Report

Kubeflow Developer

Location

Pune District, Maharashtra, India

JobType

full-time

About the job

Info This job is sourced from a job board

About the role

Deloitte

Website: deloitte.com
Job details:

Kubeflow Developer 5+years

Job Description:

A Senior Kubeflow Developer, having 8+ years of experience in software engineering or platform engineering, with substantial Kubernetes experience and 4+ years working directly with Kubeflow or related MLOps tooling. who will design, build, and maintain Kubeflow-based AI/ML platforms and applications.

This role focuses on customizing Kubeflow components (Jupyter integrations, Knative/KServe), managing Kubeflow install/upgrade/lifecycle, and implementing secure Kubernetes authentication and authorization. The ideal candidate partners closely with data scientists and platform engineers to deliver production-grade MLOps pipelines and scalable, secure AI services.

Overview

We are seeking a senior-level engineer with deep hands-on experience in Kubeflow, Kubernetes, and cloud-native MLOps to lead customization, deployment, and lifecycle management of Kubeflow deployments. You will be responsible for integrating Jupyter notebook services, extending Knative/KServe for model serving, implementing robust Kubernetes authN/authZ patterns, and ensuring reliable install/upgrade processes across environments (development, staging, production, private cloud). This is both a developer and platform-owner role — building AI/ML applications and operating the underlying Kubeflow platform.

Key responsibilities

Development and customization

  • Customize and extend Kubeflow applications and components (KFP, Pipelines, Katib, Profiles, Metadata).
  • Integrate and harden Jupyter Notebook / JupyterHub environments for interactive data science workflows.
  • Implement and extend Knative and KServe components to support custom model-serving runtimes and autoscaling patterns.
  • Create reusable manifests, operators, kustomize/Helm charts, or Kubernetes operators for repeatable deployments.

Deployment and lifecycle management

  • Design and own the install, upgrade and rollback processes for Kubeflow across clusters and environments.
  • Manage manifests and configuration (versioning, parameterization) to enable repeatable, auditable deployments.
  • Automate bootstrap and cluster lifecycle tasks, including preflight checks, dependency validation, and post-deploy verification.
  • Troubleshoot and resolve complex deployment/install issues across control plane and data plane components.

Security (authN/authZ)

  • Implement Kubernetes authentication (OIDC, RBAC, ServiceAccounts, Vault integration, short-lived credentials) and authorization policies for secure multi-tenant Kubeflow deployments.
  • Design and enforce least-privilege access models for data scientists, pipelines, and model-serving endpoints.
  • Integrate cluster security controls (namespace isolation, PSP/PSA or equivalent, network policies, admission controllers) with Kubeflow components.

CI/CD and automation

  • Build CI/CD pipelines to validate, test, and release Kubeflow manifests, application code, and model-serving images.
  • Integrate test automation for functional, security, and smoke tests as part of deployment pipelines.
  • Create git-driven workflows (GitOps) for manifests and environment promotion.




Operations, observability, and reliability

  • Instrument and monitor Kubeflow and Kubernetes control/data planes (logs, metrics, tracing).
  • Implement alerting and runbook documentation for common failure modes and operational tasks.
  • Lead post-mortems and continuous improvement of platform reliability and deployment practices.

Collaboration and enablement

  • Work closely with data scientists to translate model training and serving requirements into platform capabilities.
  • Collaborate with platform, security, and cross-fuctional teams to align on architecture, policy, and operational standards.

Required skills and experience

  • Strong experience with Kubeflow: customization, components, Pipelines, Profiles, Notebook integration, and operational management.
  • Familiarity with AI tooling on kubernetes. One or more of: LangChain, LangFlow, Spark, Airflow, Kubeflow, MLFlow, KServe, Ray
  • Good to have open-source contributions and particularly in the Kubeflow and Knative communities
  • Deep Kubernetes expertise: cluster architecture, resource management, controllers, CRDs, operators, networking, and storage.
  • Proven experience implementing Kubernetes authentication (OIDC, webhook token auth, service accounts) and authorization (RBAC, ABAC, policy enforcement).
  • Practical experience with Knative and KServe: custom predictors, scaling behavior, revisions, and annotations for serving models.
  • MLOps knowledge: model training, reproducible pipelines, model versioning, deployment patterns, inference scaling and A/B testing.
  • CI/CD tooling: building pipelines for build/test/deploy of manifests and container images (Jenkins, GitHub Actions, GitLab CI, Tekton, ArgoCD, etc.).
  • Strong troubleshooting and debugging skills for distributed systems and Kubernetes-native apps.
  • Excellent communication and collaboration skills for cross-functional teams.

Preferred qualifications

  • Experience designing cloud-native architectures and microservices patterns.
  • Familiarity with GitOps workflows and tools (ArgoCD, Flux).
  • Experience with Helm, Kustomize, and Kubernetes operators for managing manifests at scale.
  • Knowledge of container registries, image promotion, and secure image supply chains.
  • Monitoring, logging and tracing stack experience (Prometheus, Grafana, etc).
  • Familiarity with secrets management solutions (Vault, K8s SAa, ExternalSecret).
  • Prior experience maintaining or contributing to open-source Kubeflow manifests or distributions.
  • Desired experience with our repositories

Additional attributes

  • Senior-level mindset: proactive, ownership-oriented, and driven to improve platform reliability and developer productivity.
  • Comfortable working in ambiguous environments and balancing short-term fixes and long-term platform investments.
  • Willingness to mentor and grow the team’s Kubeflow and Kubernetes capabilities.

Click on Apply to know more.

Skills

LangChain
Airflow
Bootstrap
cross-functional
data science
Flux
GitHub
Helm
Jenkins
Jupyter Notebook
K8s
Kubeflow
Kubernetes
microservices
multi-tenant
Ray
Spark
test automation
Vault