Full Stack Engineer

Credflow AI

Location: Gurugram, Haryana, India
Job type: Full-time

Required skills

Python
API
backend
database
Docker
FastAPI
full stack
GitHub
Helm
JS
Kubernetes
Node
PostgreSQL
React
regression
Redis
SaaS
TypeScript

About the role

Website: credflow.ai
Job details:

Full Stack Engineer

PrimaLabs · PrimaServe Platform

Full-Time · Remote-Friendly · Early Stage

About the Role

PrimaLabs is building PrimaServe — a multitenant platform where companies can purchase and manage dedicated, optimized LLM API deployments on NVIDIA H100/H200 and AMD MI300X GPUs.

We serve inference with vLLM, SGLang, and TRT-LLM.

As a Full Stack Engineer, you will own the customer-facing platform end-to-end: the multitenant dashboard, API gateway, tenant provisioning, billing flows, and the self-service optimization portal.

What You'll Build

Customer Platform

Multitenant dashboard for tenant onboarding, API key management, usage metrics, and billing
Self-service portal for browsing model profiles, performance cards, and requesting custom optimizations
Profile comparison UI with side-by-side throughput, latency (P50/P95/P99), power, and cost across models
Public-facing benchmark portal (no login required) for model × hardware × profile discovery
Day-one onboarding experience where customers can provision a dedicated endpoint in under 10 minutes

Backend & API Layer

Tenant provisioning service to allocate dedicated vLLM containers per customer and manage GPU assignment
OpenAI-compatible API gateway with per-tenant routing, rate limiting, and authentication
Billing integration for metered usage tracking (tokens, GPU-hours) with Stripe or equivalent
Production telemetry agent collecting TTFT, ITL, throughput, and power metrics from running containers
REST endpoints for profile switching, profile recommendations, and constraint-based selection

Observability & Infrastructure

Prometheus + Grafana dashboards for per-tenant and per-model metrics
Alerting on drift, latency degradation, and SLA breaches
Multi-cluster support with canary rollouts and auto-rollback for fleet-wide profile updates
CI/CD pipelines with automated container builds and performance regression gates on every PR

Requirements

4+ years of full stack experience, including at least one production multitenant SaaS platform
Strong TypeScript and React experience, with comfort building complex, data-dense dashboards
Solid backend fundamentals including API design, authentication, job queues, and multitenant database schemas
Kubernetes experience, including writing Helm charts, understanding pod scheduling, and working with CRDs
Experience with observability tooling such as Prometheus metrics, Grafana dashboards, and alerting rules
Familiarity with LLM inference concepts such as throughput, TTFT, batch size, and KV cache, or the ability to learn quickly
Ability to work autonomously, make end-to-end decisions, and ship without handholding

Nice to Have

Experience with OpenAI-compatible API proxies or gateway layers such as LiteLLM
Exposure to vLLM, SGLang, or other LLM serving frameworks
Experience building usage-based billing and metering systems
Background building platforms that provision infrastructure on behalf of tenants
Familiarity with NVIDIA NGC or NIM container formats

Tech Stack

React · TypeScript · Node.js or Python (FastAPI) · PostgreSQL · Redis · Docker · Kubernetes · Helm · Prometheus · Grafana · GitHub Actions · vLLM · OpenAI-compatible APIs

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.