Sr. Full Stack Engineer (Platform)

PrimaLabs

full-time

Required skills

Python
API
backend
database
Docker
FastAPI
full stack
GitHub
Helm
JS
Node
PostgreSQL
regression
Redis
SaaS
TypeScript

About the role

Website: primalabs.ai
Job details:

Sr. Full Stack Engineer (Platform)

PrimaLabs · PrimaServe Platform

Full-Time · Gurugram

About the Role

PrimaLabs is building PrimaServe — a multitenant platform where companies purchase and manage dedicated, optimized LLM API deployments on NVIDIA H100/H200 and AMD MI300X GPUs. We serve inference with vLLM, SGLang, and TRT-LLM. You will own the customer-facing platform end-to-end: the multitenant dashboard, API gateway, tenant provisioning, billing flows, and the self-service optimization portal.

What You'll Build

Customer Platform

•Multitenant dashboard: tenant onboarding, API key management, usage metrics, and billing

•Self-service portal for browsing model profiles, performance cards, and requesting custom optimizations

•Profile comparison UI — side-by-side throughput, latency (P50/P95/P99), power, and cost across models

•Public-facing benchmark portal (no login required) for model × hardware × profile discovery

•Day-One onboarding: customer provisions a dedicated endpoint in under 10 minutes

Backend & API Layer

•Tenant provisioning service: allocates dedicated vLLM containers per customer, manages GPU assignment

•OpenAI-compatible API gateway with per-tenant routing, rate limiting, and authentication

•Billing integration: metered usage tracking (tokens, GPU-hours) with Stripe or equivalent

•Production telemetry agent collecting TTFT, ITL, throughput, and power from running containers

•REST endpoints for profile switching, profile recommendations, and constraintbased selection

Observability & Infrastructure

•Prometheus + Grafana dashboards for per-tenant and per-model metrics

•Alerting on drift, latency degradation, and SLA breaches

•Multi-cluster support with canary rollout and auto-rollback for fleet-wide profile updates

•CI/CD pipelines: automated container builds, performance regression gates on every PR

Requirements

•4+ years of full stack experience with at least one production multitenant SaaS platform

•Strong TypeScript and React; comfortable building complex, data-dense dashboards

•Solid backend fundamentals: API design, auth, job queues, database schema for multitenant systems

•Kubernetes comfort: writing Helm charts, understanding pod scheduling, working with CRDs

•Experience with observability tooling: Prometheus metrics, Grafana dashboards, alerting rules

•Familiarity with LLM inference concepts (throughput, TTFT, batch size, KV cache) or willingness to learn fast

•Ability to work autonomously, own decisions end-to-end, and ship without handholding

Nice to Have

•Experience with OpenAI-compatible API proxies or gateway layers (e.g., LiteLLM)

•Exposure to vLLM, SGLang, or other LLM serving frameworks

•Experience building usage-based billing and metering systems

•Background building platforms that provision infrastructure on behalf of tenants

•Familiarity with NVIDIA NGC / NIM container formats

Tech Stack

React · TypeScript · Node.js or Python (FastAPI) · PostgreSQL · Redis · Docker · Kubernetes · Helm · Prometheus · Grafana · GitHub Actions · vLLM · OpenAIcompatible APIs

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.