CloudNuro.AI
Website:
cloudnuro.ai
Job details:
Program structure
Track: Engineering
Reports to: Senior staff engineer, EOS Control Plane team
Duration: 12–20 weeks, full-time preferred
Primary languages: Go or Rust for the gateway, Python for tooling; familiarity with gRPC / proto- buf
Outcome: Production-grade contribution to the metering or routing layer with measurable performance and accuracy improvements
Compensation: stipend per internal scale; conversion to full-time considered for strong performers.
Mentorship: each intern is paired with a senior engineer or researcher who is the technical owner of the area.
How to apply: Send
• Resume / CV (PDF).
• A link to a GitHub profile, portfolio, or representative project.
• The role number(s) you are applying for. You can apply for up to two.
• The application-prompt response for the role you are most interested in (300–500 words).
Applications without the prompt response will be deprioritized it is the single most useful signal we have.
About the role
The data and control planes of the EOS are where the abstractions hit the wire. Every LLM request, every tool call, every cache lookup passes through our gateway, gets metered, gets routed, and gets recorded. The mathematics is only as good as the infrastructure that gives it accurate, fresh, complete data and only as fast as the gateway that executes its decisions in under a few hundred microseconds.
This is a systems engineering role with a strong focus on correctness and latency. You will write code that runs in production, on the hot path, where every microsecond of overhead is a microsecond of latency the customer pays for. You will work on the metering pipeline (counting tokens accurately, attributing them to the right tenant, persisting them durably), on the gateway's routing decisions (translating policy decisions from the intelligence plane into per-request actions), and on the observability that makes the whole thing operable.
What you will work on
Core deliverables
• Build one production-grade module in the metering or routing layer. Concrete examples: a streaming token counter with model-specific tokenizer integration; a request-tagging middleware enforcing per-request lineage; a circuit-breaker for runaway agent loops with sub-millisecond decision latency.
• Instrument the module with OpenTelemetry traces, Prometheus metrics, and structured logs; ensure operators have visibility into both correctness and performance.
• Write integration tests that exercise the module against realistic synthetic workloads from our generator; document the throughput-latency curve.
• Author a runbook and a design document so the on-call rotation can support the module without paging you.
Stretch goals
• Contribute to the model-provider adapter framework: add a new provider, with proper cost accounting, retry semantics, and quota awareness.
• Optimize a hot path identified by production profiling reduce P99 latency by a concrete percentage with proof from before/after benchmarks.
• Help design the next iteration of the unified state object: schema evolution, versioned snapshots, rollback paths.
What we are looking for
Required
• Comfortable writing Go or Rust at a level appropriate to production code interfaces, error handling, concurrency, profiling.
• Have built and operated at least one networked service end-to-end; know what HTTP status codes are for, what backpressure means, why retries multiply load.
• Familiar with the basics of distributed systems: leader election, consistent hashing, idempotency keys, exactly-once vs at-least-once semantics.
• Treat correctness as a first-class concern; write tests, run them, and notice when they would not have caught the bug you just shipped.
Nice to have
• Experience with gRPC, protobuf, OpenTelemetry, Prometheus, or Kubernetes.
• Have written or contributed to an LLM gateway, an API proxy, a streaming pipeline, or a high-throughput message broker.
• Comfortable reading and writing flame graphs, perf reports, or eBPF traces.
• Familiar with at least one tokenizer library (tiktoken, sentencepiece, transformers) and the gotchas around streaming token counting.
Success criteria
• By week 3: development environment up, first PR landed (small but real), pager rotation shadowed.
• By week 8: the primary module in production, with monitoring, runbook, and at least one incident handled.
• By the end of the internship: a documented performance improvement (latency, throughput, accuracy, or cost) with measurements that survive code review.
Application prompt for this role
Describe an incident, bug, or performance issue you debugged in a networked or distributed system. In 300 to 500 words, walk through how you identified the root cause, what you measured, what you ruled out, and how you fixed it. The system can be small (a personal project counts) but the methodology should be honest about what you knew, when, and how.
===================================================
Internship Opportunities — Economic Operating System for AI Infrastructure
We are building an Economic Operating System (EOS) for enterprise AI infrastructure: a closed-loop control plane that meters every token, forecasts demand probabilistically, routes requests under multi-objective constraints,
learns from outcomes, and rebalances provider portfolios. The platform composes mathematics from stochastic optimization, queueing theory, control engineering, reinforcement learning, mechanism design, and portfolio theory into a coherent runtime that operates across five timescales from millisecond routing decisions to quarterly provider negotiations.
We are hiring interns to work alongside the core team on distinct, well-scoped streams of this project. Each role contributes to a specific layer of the system; each intern will own a measurable outcome by the end of the internship;
and each role is designed so that the work products, code, simulations, papers, or operational dashboards are portfolio pieces that demonstrate serious technical depth.
Selection process
· Initial screen — resume + a short written response (300–500 words) to a prompt specific to the role.
· Technical interview — one 60-minute session focused on the role's primary domain (math, code review, or systems design, depending on role).
· Take-home — a small, scoped exercise (4–6 hours) using realistic data from our synthetic generator.
· Final conversation — meet the mentor, discuss scope, align on deliverables and timeline.
Shared expectations across all roles
· Write and review code in Python (primary) or one of Go, Rust, TypeScript, depending on role.
· Work in the open — pull requests with descriptive messages, design docs before non-trivial work, weekly written updates.
· Read papers critically and translate them into running code; bench specific claims against our synthetic and production traces.
· Communicate clearly in writing and in standups; intellectual honesty about what works and what does not.
· Care about correctness, reproducibility, and the user-facing impact of the platform on real bills and real latencies.
A note on what we mean by "intern"
We are looking for people who are still in formal education or within 12 months of graduation. We hire interns who are undergraduate students with strong technical foundations, master's students with deeper specialization, and doctoral students who want a focused industrial experience. The role descriptions are written to scale: an undergraduate intern on Role 1 might focus on the core deliverables and one stretch goal; a doctoral intern might own a stretch goal as their primary track and publish a paper from it. We will calibrate scope to your background during the final conversation.
A final note on culture
This project sits at an unusual intersection applied probability, distributed systems, reinforcement learning, and economics and we hire people who are genuinely curious about all of it, not just the part of it that touches their primary specialization. The four roles depend on each other. The simulator is the RL intern's training environment. The detector is the systems intern's last line of defense. The trained policy is the operations intern's hardest-to-debug failure mode. We expect interns to talk to each other early and often, to read each other's design docs, and to push back on each other's assumptions. Pleasant disagreement, in our experience, is the second-most reliable predictor of good outcomes after technical depth itself.
Click on Apply to know more.