Founding Deep Learning Deployment Engineer (Senior/Forward Deployment Engineer - Founding Team)

Kuvaka Tech

full-time

Required skills

Python
AWS
Android
Azure
C++
caching
cloud infrastructure
code review
CUDA
deep learning
end-to-end
Fusion
GCP
GitHub
iOS
macOS
Rust
state management
TensorFlow
Terraform
Websockets
x86
Pytorch
ONNX

About the role

Kuvaka Tech

Website: kuvaka.io
Job details:

Location: On-site, Vijaynagar, Indore, MP, India (Preferred) / Remote

Availability: Immediate joiners or someone who can join us within 20-30 days

Budget: Up to 25 LPA CTC (Depending on skills and experience)

About Us & This Role:

We're an early-stage company building a product around deep learning for audio and signal processing. We're explicitly looking for a technical co-founder who owns the engineering side of the company.

There is no existing platform. No deployment pipeline. No SDK. No infrastructure. You will build all of it from scratch.

What You'll Be Doing:

Building the Product (Hands-On, Greenfield)
Take research models from our research team — typically Jupyter notebooks, PyTorch checkpoints, and unoptimized code — and turn them into clean, production-grade software.
Optimise models for inference using pruning, quantisation, distillation, and graph optimisation with ONNX, TensorRT, and Triton Inference Server.
Design and build our deployment pipeline from scratch — there is none today. You decide how it works.
Build our Python SDK — handle packaging, versioning, publishing to PyPI, and design the developer experience for whoever consumes it.
Stand up our cloud infrastructure using Infrastructure as Code (Terraform, Pulumi, or similar). You'll choose the cloud provider and the patterns we use.
Build local-deployment paths so smaller models run efficiently on CPU on Windows and macOS, with platform-specific optimisations.
Helping Build the Company

Beyond writing code, you'll be helping us figure out:

What our engineering culture looks like — code review standards, testing philosophy, and how we ship.
What to build vs. what to buy — every infrastructure choice has cost, complexity, and lock-in trade-offs we'll make together.
Who to hire next — you'll have a real voice in growing the engineering team, and eventually you may lead it.
How research and engineering collaborate — defining the handoff process between research and production.
The technical roadmap — the founders are honest about not knowing the production side. We need your judgment, not just your hands.
You will not be a hired pair of hands executing a plan. You'll be a partner in figuring out what the plan should be.

Technical Requirements:

Read each item honestly — if most of these describe work you've actually done (not just heard of), you're a strong fit.

Python & Software Engineering (Must-Have):

5+ years writing production Python, with a deep understanding of the language: typing (mypy, pydantic), async/await, context managers, decorators, generators, and the data model.
Strong grasp of modern Python tooling: pyproject.toml, setuptools/hatchling/poetry-core as build backends, virtual environments, and at least one of poetry, uv, hatch, or pdm.
Published Python packages to PyPI or a private index — you understand wheels vs. sdists, platform tags (manylinux, macosx, win_amd64), entry points, and dependency resolution.
Comfortable writing C/C++/Rust extensions or bindings when needed (via pybind11, nanobind, cffi, or PyO3) — or willing to learn quickly.
Strong testing discipline: pytest, fixtures, parameterisation, mocking, and writing tests that catch real bugs without becoming a maintenance burden.
Familiarity with code quality tooling: ruff, black, mypy, pre-commit hooks.

Deep Learning & Model Optimisation (Must-Have)

3–4+ years of hands-on deep learning experience, including production deployment — not just training.
Fluent in PyTorch (and ideally familiar with TensorFlow or JAX); able to read research code and understand custom layers, autograd, and training loops.
Model compression and optimisation techniques, with real production experience in at least three of: Post-training quantisation (INT8, FP16, BF16), Quantisation-aware training Structured and unstructured pruning Knowledge distillation Operator fusion and graph optimisation
ONNX: exporting models from PyTorch, debugging unsupported ops, simplifying graphs (onnx-simplifier), and inspecting models with Netron.
ONNX Runtime: execution providers (CPU, CUDA, CoreML, OpenVINO, DirectML), session options, and performance tuning.
TensorRT: building engines, working with plugins, calibration for INT8, and debugging precision issues.
Triton Inference Server: model repository structure, ensemble models, dynamic batching, configuration of multiple backends (PyTorch, ONNX, TensorRT, Python).
Understanding of inference performance fundamentals: latency vs. throughput trade-offs, batch size effects, memory bandwidth limits, and how to profile a model end-to-end.

Local/On-Device Deployment (Must-Have):

Deploying small DL models on CPU for Windows and macOS, with experience in at least two of: ONNX Runtime with platform-specific execution providers, OpenVINO for x86 CPU optimisation, Core ML / coremltools for macOS and Apple Silicon, Apple's Accelerate framework / Metal Performance Shaders, ggml-style runtimes or other quantised inference libraries
Understanding of CPU-specific optimization: SIMD (AVX2, AVX-512, NEON), threading models, and memory layout.
Experience packaging models for distribution in desktop applications — handling model versioning, downloading, caching, and integrity checks.
Awareness of cross-platform build challenges: code signing on macOS, MSVC vs. MinGW on Windows, universal binaries for Apple Silicon.

Audio & Signal Processing (Must-Have):

Solid DSP fundamentals: sampling theory, FFT, STFT, filtering, windowing, and aliasing.
Hands-on experience with audio data: working with WAV/FLAC/MP3, resampling, normalisation, and common audio feature extraction (MFCCs, mel-spectrograms, log-mel).
Familiar with audio Python libraries: librosa, torchaudio, soundfile, scipy.signal.
Awareness of real-time audio constraints: streaming inference, chunking strategies, lookahead vs. causal models, latency budgets.

Bonus: experience with audio-specific model architectures (Conformer, Wav2Vec2, Whisper, RNN-T, diffusion-based audio models, etc.).

Cloud & Infrastructure (Must-Have):

Production experience with at least one major cloud: AWS, GCP, or Azure.
Infrastructure as Code with hands-on experience in at least one of:
Terraform (preferred) — modules, state management, workspaces
Pulumi — preferably with the Python SDK
AWS CDK or equivalent
Containers: writing efficient Dockerfiles, multi-stage builds, GPU-enabled images (nvidia/cuda base images), and understanding image size optimisation.
Kubernetes: at least basic working knowledge — deployments, services, configmaps, secrets, and ideally GPU scheduling.
CI/CD with GitHub Actions, GitLab CI, or similar — you've built pipelines, not just used them.
Familiar with GPU instance types, costs, and trade-offs across cloud providers.

Personal Traits (Must-Have):

Comfort owning ambiguity — you can make good decisions without complete information.
Pragmatism over perfectionism — you ship things that work, then improve them.
Strong written communication — you can document decisions and explain technical trade-offs to non-technical founders.
Builder's instinct — the absence of existing infrastructure is a feature for you, not a bug.

Nice-to-Have:

Founding or early-stage startup experience (employees 1–10).
Experience designing and shipping SDKs that external developers actually use — including documentation, examples, and versioning policies.
Real-time/streaming inference experience — WebSockets, gRPC bidirectional streaming, or audio streaming protocols.
Experience with model registries (MLflow, Weights & Biases Artifacts, DVC) and reproducible model versioning.
Familiarity with WebAssembly/WASM for in-browser model inference.
Mobile deployment experience (iOS Core ML, Android NNAPI, TensorFlow Lite).
Open-source contributions to ML, audio, or inference frameworks.
Experience mentoring or leading engineers — you'll likely be doing this within a year.

Self-Assessment:

Strong fit: You match nearly all the must-haves and several nice-to-haves. You should apply.
Reasonable fit: You match most of the must-haves but have gaps in 1–2 areas (e.g., strong on deployment but light on audio, or vice versa). Apply and tell us in your cover note where you'd grow into the role.
Not yet a fit: You match the deep learning side but haven't shipped models to production, or you're strong on infrastructure but haven't worked with deep learning models. This role will be frustrating for you — wait until you have more deployment experience under your belt.

What You Should Know Before Applying

It is not a comfortable senior IC role at an established company. There are no existing systems, no senior engineers above you, no established playbook. You will figure things out.
The founders are not engineers. We will lean on your judgment heavily for technical decisions. We're good collaborators, but we won't be debugging your Kubernetes config.
The work will be broad. Some weeks you'll be deep in TensorRT optimisation. Other weeks you'll be writing docs, choosing a cloud provider, or interviewing candidates.
The trade-off is real ownership. You'll have a level of impact, autonomy, and equity that's not available at a larger company. The founding decisions will be yours.
If this kind of scope sounds like a chance to do the most meaningful work of your career, you're who we're looking for. If it sounds stressful and unstructured, this role will not make you happy, and we'd rather know that now.

Interview Process:

Coding Round (Pair Programming) We'll solve a coding problem together. We care more about how you think and communicate than whether you reach the perfect answer.
Deep Learning Deep-Dive A technical conversation on deep learning concepts, model optimisation techniques, and deployment trade-offs. Expect discussion on pruning, quantisation, runtime selection, and real production scenarios.
Founding Fit & Problem Discussion: A two-way conversation about the actual problems you'll be solving, the company we're building, and whether this role fits your goals. We'll be honest about where we are; we expect you to be honest about what you want.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.