Website:
primalabs.ai
Job details:
Sr. Full Stack Engineer (Platform)
PrimaLabs · PrimaServe Platform
Full-Time · Gurugram
About the Role
PrimaLabs is building PrimaServe — a multitenant platform where companies purchase and manage dedicated, optimized LLM API deployments on NVIDIA H100/H200 and AMD MI300X GPUs. We serve inference with vLLM, SGLang, and TRT-LLM. You will own the customer-facing platform end-to-end: the multitenant dashboard, API gateway, tenant provisioning, billing flows, and the self-service optimization portal.
What You'll Build
Customer Platform
•Multitenant dashboard: tenant onboarding, API key management, usage metrics, and billing
•Self-service portal for browsing model profiles, performance cards, and requesting custom optimizations
•Profile comparison UI — side-by-side throughput, latency (P50/P95/P99), power, and cost across models
•Public-facing benchmark portal (no login required) for model × hardware × profile discovery
•Day-One onboarding: customer provisions a dedicated endpoint in under 10 minutes
Backend & API Layer
•Tenant provisioning service: allocates dedicated vLLM containers per customer, manages GPU assignment
•OpenAI-compatible API gateway with per-tenant routing, rate limiting, and authentication
•Billing integration: metered usage tracking (tokens, GPU-hours) with Stripe or equivalent
•Production telemetry agent collecting TTFT, ITL, throughput, and power from running containers
•REST endpoints for profile switching, profile recommendations, and constraintbased selection
Observability & Infrastructure
•Prometheus + Grafana dashboards for per-tenant and per-model metrics
•Alerting on drift, latency degradation, and SLA breaches
•Multi-cluster support with canary rollout and auto-rollback for fleet-wide profile updates
•CI/CD pipelines: automated container builds, performance regression gates on every PR
Requirements
•4+ years of full stack experience with at least one production multitenant SaaS platform
•Strong TypeScript and React; comfortable building complex, data-dense dashboards
•Solid backend fundamentals: API design, auth, job queues, database schema for multitenant systems
•Kubernetes comfort: writing Helm charts, understanding pod scheduling, working with CRDs
•Experience with observability tooling: Prometheus metrics, Grafana dashboards, alerting rules
•Familiarity with LLM inference concepts (throughput, TTFT, batch size, KV cache) or willingness to learn fast
•Ability to work autonomously, own decisions end-to-end, and ship without handholding
Nice to Have
•Experience with OpenAI-compatible API proxies or gateway layers (e.g., LiteLLM)
•Exposure to vLLM, SGLang, or other LLM serving frameworks
•Experience building usage-based billing and metering systems
•Background building platforms that provision infrastructure on behalf of tenants
•Familiarity with NVIDIA NGC / NIM container formats
Tech Stack
React · TypeScript · Node.js or Python (FastAPI) · PostgreSQL · Redis · Docker · Kubernetes · Helm · Prometheus · Grafana · GitHub Actions · vLLM · OpenAIcompatible APIs
Click on Apply to know more.