Website:
credflow.ai
Job details:
Full Stack Engineer
PrimaLabs · PrimaServe Platform
Full-Time · Remote-Friendly · Early Stage
About the Role
PrimaLabs is building PrimaServe — a multitenant platform where companies can purchase and manage dedicated, optimized LLM API deployments on NVIDIA H100/H200 and AMD MI300X GPUs.
We serve inference with vLLM, SGLang, and TRT-LLM.
As a Full Stack Engineer, you will own the customer-facing platform end-to-end: the multitenant dashboard, API gateway, tenant provisioning, billing flows, and the self-service optimization portal.
What You'll Build
Customer Platform
- Multitenant dashboard for tenant onboarding, API key management, usage metrics, and billing
- Self-service portal for browsing model profiles, performance cards, and requesting custom optimizations
- Profile comparison UI with side-by-side throughput, latency (P50/P95/P99), power, and cost across models
- Public-facing benchmark portal (no login required) for model × hardware × profile discovery
- Day-one onboarding experience where customers can provision a dedicated endpoint in under 10 minutes
Backend & API Layer
- Tenant provisioning service to allocate dedicated vLLM containers per customer and manage GPU assignment
- OpenAI-compatible API gateway with per-tenant routing, rate limiting, and authentication
- Billing integration for metered usage tracking (tokens, GPU-hours) with Stripe or equivalent
- Production telemetry agent collecting TTFT, ITL, throughput, and power metrics from running containers
- REST endpoints for profile switching, profile recommendations, and constraint-based selection
Observability & Infrastructure
- Prometheus + Grafana dashboards for per-tenant and per-model metrics
- Alerting on drift, latency degradation, and SLA breaches
- Multi-cluster support with canary rollouts and auto-rollback for fleet-wide profile updates
- CI/CD pipelines with automated container builds and performance regression gates on every PR
Requirements
- 4+ years of full stack experience, including at least one production multitenant SaaS platform
- Strong TypeScript and React experience, with comfort building complex, data-dense dashboards
- Solid backend fundamentals including API design, authentication, job queues, and multitenant database schemas
- Kubernetes experience, including writing Helm charts, understanding pod scheduling, and working with CRDs
- Experience with observability tooling such as Prometheus metrics, Grafana dashboards, and alerting rules
- Familiarity with LLM inference concepts such as throughput, TTFT, batch size, and KV cache, or the ability to learn quickly
- Ability to work autonomously, make end-to-end decisions, and ship without handholding
Nice to Have
- Experience with OpenAI-compatible API proxies or gateway layers such as LiteLLM
- Exposure to vLLM, SGLang, or other LLM serving frameworks
- Experience building usage-based billing and metering systems
- Background building platforms that provision infrastructure on behalf of tenants
- Familiarity with NVIDIA NGC or NIM container formats
Tech Stack
React · TypeScript · Node.js or Python (FastAPI) · PostgreSQL · Redis · Docker · Kubernetes · Helm · Prometheus · Grafana · GitHub Actions · vLLM · OpenAI-compatible APIs
Click on Apply to know more.