AI Platform Engineer

VXNeo Labs

full-time

Required skills

Python
Open Source
Android
backend
Bash
CMS
compliance
CUDA
Dart
data pipeline
Docker
FastAPI
Firebase
Flutter
founder
GitHub
GPU
IoT
JavaScript
Jira
multi-tenant
proxy
reverse proxy
Redis
Reverse Proxy
SSH
SSL
TypeScript
uptime

About the role

Website:
Job details:

This is a role, where you will own the infrastructure layer — the model routing engine, the open-source model deployment stack, the EU server infrastructure, and the Pi fleet management. You will also

ship features across the FastAPI backend, Flutter mobile apps, and coordinator dashboard. You report directly to the founder. No middle management. No ticket queues. Decisions in

hours, not sprints.If you want to write code that matters and ships the same week — this is for you.

What Neo Does Today (Your Starting Point)

| Feature | Technology |

| Voice wake word | Porcupine — custom "Hey Neo" model |
| Speech-to-text | Whisper STT — local on Pi, zero cloud |
| Text-to-speech | Piper TTS — natural human voice |
| AI model routing | Claude · Mistral · DeepSeek V4 · Qwen 3 · Llama 3.3 |
| On-device inference | Ollama — Mistral 7B / Phi-3 Mini running on Pi |
| Graph memory | Neo4j — 9-dimensional, persistent across reboots |
| Vector search | Qdrant — semantic memory retrieval |
|Real-time state | Redis — alert queue and emotion state |
| Fall detection | Sony AITRIOS IMX500 + PoseNet — on-chip, zero CPU |
| Health check-ins | APScheduler — daily pain and mood tracking |
| Voice email | Gmail API + Whisper — reads and dictates replies |
| API layer | FastAPI (Python 3.13) — 19 endpoints |
| Mobile apps | Flutter — Android (Play Store) + iOS (App Store) |
| Infrastructure | Hetzner VPS · OVHcloud France · Docker Compose · systemd |
| Deployment | 6 auto-restarting systemd services — fully autonomous |

What You Will Own and Build

Model Router & Open-Source AI Stack (highest priority)

- Implement and maintain neo_model_router.py — 3-tier routing engine

(Tier 1: Ollama on-device → Tier 2: Mistral/Qwen/DeepSeek EU cloud → Tier 3: Claude safety net)

- Set up and optimise Ollama on Pi 5 fleet — model quantisation (Q4_K_M), performance tuning

- Integrate Mistral API (EU-hosted, GDPR-native) as primary Tier 2 provider

- Evaluate and onboard new open-source models: Qwen 3, DeepSeek V4, Phi-3, Gemma 2

- Build the vLLM self-hosting stack for future GPU data center deployment

- Safety classifier logic — routing critical care queries (pain 8+, falls, crisis) to appropriate models

EU Infrastructure & Data Residency

- Migrate and manage EU workloads on OVHcloud Strasbourg (France-resident data)

- Docker Compose orchestration across Hetzner (Germany) and OVHcloud (France)

- GDPR-compliant data pipeline — ensure zero personal data leaves EU servers

- Monitoring, alerting, and uptime for 24/7 care AI (clients depend on this at night)

- SSL, TLS, reverse proxy (Nginx/Caddy), secrets management

Pi 5 Fleet Engineering

- Remote management of Pi 5 devices at client homes (Raspberry Pi Connect)

- Automated deployment scripts — new client Pi setup in under 10 minutes

- Systemd service monitoring, auto-recovery, over-the-air config updates

- Whisper STT optimisation for Austrian German dialect accuracy

FastAPI Backend

- Extend from 19 to 30+ endpoints as new features ship

- /ask/v2 — the new model-router-backed endpoint (already designed, needs wiring)

- Health analytics pipeline — pain trend analysis, coordinator alerts, weekly reports

- Multi-tenant architecture — clean data isolation across Prime Program clients

Mobile & Dashboard

- Flutter app features (Android + iOS) — client-facing and coordinator-facing

- chatvx.com coordinator dashboard (Next.js) — real-time client health overview

- Telegram bot improvements — coordinator alert formatting, rich notifications

Data Center Roadmap (6–18 months)

- Phase 1: Optimise on-device Ollama across Pi fleet

- Phase 2: Stand up shared GPU inference server (RTX 4090 or A10G) — self-host Qwen/Mistral

- Phase 3: Full vLLM cluster — all Tier 2 models self-hosted, zero external API dependency

- This is the engineering track that removes €500+/month in external API costs at scale

Your Tech Stack

Languages

Python 3.13 · Dart (Flutter) · JavaScript/TypeScript (Next.js) · Bash

AI & Models (open-source focus)

Ollama · vLLM · Mistral 7B / Small / Medium · Qwen 3 72B · DeepSeek V4 · Llama 3.3 70B ·

Phi-3 Mini · Whisper STT · Piper TTS · Porcupine · OpenWakeWord · PoseNet

Model APIs

Mistral API · Together AI · Cohere · Anthropic Claude (safety fallback only)

Databases

Neo4j (graph) · Qdrant (vector) · Redis (state)

Infrastructure

Raspberry Pi 5 (8GB) · Hetzner VPS · OVHcloud (France) · Docker Compose ·

systemd · Nginx · Raspberry Pi Connect · GitHub Actions

APIs & Integrations

Gmail API · Google Calendar · Telegram Bot · Firebase · Brevo SMTP · Ghost CMS ·

Sony AITRIOS IMX500

Compliance Environment

GDPR Art.28 · Austrian DSG · IRAP SR&ED (Canada) · EU AI Act awareness

You Are Exactly Right For This If

- You have *3+ years of Python backend* — FastAPI, async, real production systems

- You have deployed *open-source LLMs* in any form — Ollama, vLLM, llama.cpp, HuggingFace

- You are comfortable with *Linux, SSH, Docker, systemd* — you fix things without being told where

- You have worked on *edge or IoT systems* — or are deeply curious about them

- You think about *cost efficiency* — you understand why routing matters at scale

- You are genuinely excited about *privacy-first, sovereign AI* — not just cloud wrappers

- You *ship things* — GitHub activity, side projects, something real you built and deployed

- You can work *across IST / CEST (Vienna) / EST (Ontario)* time zones independently

- You communicate proactively — *no chasing for updates*, no daily standups needed

- You care about the mission — real disabled clients in Vienna depend on this system

Strong Bonus If You Have

- Experience with *vLLM* or GPU inference server setup (CUDA, memory management)

- Quantisation experience — GGUF, Q4_K_M, AWQ formats

- Flutter / Dart — even basic experience with the mobile apps

- Neo4j — graph queries, Cypher, schema design

- Austrian German or European language context — helps with client empathy

- Experience with GDPR technical implementation — DPA compliance, data minimisation

- Knowledge of European care tech, disability tech, or clinical AI

Salary & Compensation

- Compensation: based on experience

- Equity: Discussion after 6-month successful review — early-stage, real upside

- Work: Fully remote from India · async-first · flexible hours

- Hardware: We ship you what you need for testing (Pi 5 dev kit discussed on onboarding)

- Growth: You are employee #1 in India — as the team grows, you grow with it

What We Are Not

- We are not a body shop. You own your work.

- We are not a corporate AI team. No Jira ceremonies, no sprint reviews.

- We are not a startup with a demo and no users. We have a live client's, a signed contract,with care organisation's

- We do not use AI to replace care workers. We use AI to give disabled people more autonomy.

Our Values

Sovereignty — Client data stays on-device. Privacy is not a feature, it is the architecture.

Open Source First — We build with open models so we never depend on one vendor's pricing.

Our data center roadmap exists precisely to eliminate that dependency entirely.

Impact Over Optics — Every line of code we ship touches a real person's daily life.

Amrit in Vienna hears Neo's voice every morning. That is who we build for.

Craft — We write code that is readable, documented, and built to last. Fast is good.

Fast and clean is what we ship.

How to Apply

Send an email to contact@vxneolabs.com with the subject line:

AI Platform Engineer — [Your Name] — India

Include:

1. *Something you built and deployed* — one paragraph, no templates. What was it,

what stack, what did you learn, what broke and how did you fix it.

2. *Your GitHub profile* — or any code you can share publicly

3. *Your experience with open-source LLMs* — even a paragraph. Ollama, vLLM,

Hugging Face, llama.cpp — anything real.

4. *Your resume or LinkedIn*

5. *Expected monthly compensation in INR*

6. *Your availability to start*

Applications without a GitHub link or a specific "something I built" paragraph

will not be reviewed. We are not looking for the best resume. We are looking for

the best builder.

Direct applications only. No recruiters. No agencies.

*VXNeo Labs Inc. is an equal opportunity employer. We especially welcome applications

from engineers with lived experience of disability or caregiving — you will understand

our mission better than anyone.*

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.