CloudNuro.AI
Website:
cloudnuro.ai
Job details:
Program structure
Track: Research
Reports to: Lead research scientist, EOS Stochastic Models team
Duration: 16–24 weeks, full-time preferred
Primary languages: Python (NumPy, SciPy, JAX or PyTorch), some Cython for hot paths
Outcome: A reproducible workload generator and a published technical note on tail-risk estimators for AI workloads
Compensation: stipend per internal scale; conversion to full-time considered for strong performers.
Mentorship: each intern is paired with a senior engineer or researcher who is the technical owner of the area.
How to apply: Send
• Resume / CV (PDF).
• A link to a GitHub profile, portfolio, or representative project.
• The role number(s) you are applying for. You can apply for up to two.
• The application-prompt response for the role you are most interested in (300–500 words).
Applications without the prompt response will be deprioritized it is the single most useful signal we have.
About the role
The EOS sits on top of a probabilistic model of what AI workloads look like. Every routing decision, every budget pacing update, every capacity forecast traces back to a distribution we have fit on production telemetry. This role is about getting those distributions and their compositions right and about building the simulation harness that lets the rest of the platform be evaluated honestly before being deployed.
You will spend a substantial fraction of your time on what looks like classical applied probability: fitting log-normal, negative binomial, and Weibull distributions to large token-level traces, validating fits with posterior predictive checks, building Monte Carlo estimators with proper variance reduction, and computing tail-risk measures (VaR and CVaR) with confidence intervals. Where you find that the standard parametric families do not fit, you will propose, document, and validate alternatives — Pareto tails, mixture models, or copula-coupled joint distributions.
What you will work on
Core deliverables
• Extend the open synthetic workload generator (eos_synth.py). Add new distributions, a richer agent-trace generator with parameterizable Markov chains, and a multi-tenant arrival simulator that mixes Poisson and Hawkes components.
• Fit our four canonical token-level distributions to fresh production traces; report goodness-of-fit, posterior predictive checks, and a comparison against the previous quarter's parameters.
• Build a CVaR estimator with explicit confidence intervals and importance-sampling variance reduction; bench it against the empirical estimator at α = 0.95 and 0.99.
• Validate that the simulator reproduces the headline percentiles (P50, P95, P99) of a real workload from our anonymized historical archive.
• Write a 6–10 page technical note: "Tail-risk estimation for agentic AI workloads," with code that reproduces every figure.
Stretch goals
• Compound session cost as a random sum with dependent N and per-step cost; closed-form approximations and their accuracy versus full simulation.
• Heavy-tail diagnostics on output token counts; Pareto mixture model versus negative binomial under model misspecification.
• Joint distribution of (prompt length, output length, tool calls) via Gaussian copula on the marginals; sensitivity analysis.
What we are looking for
Required
• Solid grounding in probability and statistics — distributions, conditional expectation, large-sample theory.
• Strong Python; comfortable with NumPy, SciPy, pandas; opinionated about reproducibility (seeds, tests, deterministic builds).
• Have implemented Monte Carlo simulations at a non-trivial scale (≥ 10⁵ samples) for a course project or independent work.
• Read at least one of: Glasserman's Monte Carlo Methods in Financial Engineering, Rockafellar– Uryasev on CVaR, McNeil/Frey/Embrechts on quantitative risk.
Nice to have
• Course or research experience in extreme value theory, stochastic processes, or applied Bayesian statistics.
• Familiarity with Hawkes processes or self-exciting point processes.
• Open-source contributions to a statistical library (statsmodels, PyMC, NumPyro, scikit-learn) or a Monte Carlo / simulation framework.
• A favorite plot type and an opinion about why it tells the truth better than alternatives.
Success criteria
• By week 4: a working PR extending the generator with one new distribution and full test coverage.
• By week 10: the importance-sampled CVaR estimator running on production traces with documented confidence intervals.
• By the end of the internship: the technical note shared internally, with all numerics reproducible from a single command on a laptop.
Application prompt for this role
Pick one of the four token-level distributions (log-normal prompts, negative-binomial outputs, Poisson- mixture tool calls, Weibull latency). In 300 to 500 words, describe one weakness of that parametric family for AI workloads, propose an alternative, and explain how you would empirically validate that the alternative is better. Use math notation freely; assume a technically literate reader.
===================================================
Internship Opportunities — Economic Operating System for AI Infrastructure
We are building an Economic Operating System (EOS) for enterprise AI infrastructure: a closed-loop control plane that meters every token, forecasts demand probabilistically, routes requests under multi-objective constraints,
learns from outcomes, and rebalances provider portfolios. The platform composes mathematics from stochastic optimization, queueing theory, control engineering, reinforcement learning, mechanism design, and portfolio theory into a coherent runtime that operates across five timescales from millisecond routing decisions to quarterly provider negotiations.
We are hiring interns to work alongside the core team on distinct, well-scoped streams of this project. Each role contributes to a specific layer of the system; each intern will own a measurable outcome by the end of the internship;
and each role is designed so that the work products, code, simulations, papers, or operational dashboards are portfolio pieces that demonstrate serious technical depth.
Selection process
· Initial screen — resume + a short written response (300–500 words) to a prompt specific to the role.
· Technical interview — one 60-minute session focused on the role's primary domain (math, code review, or systems design, depending on role).
· Take-home — a small, scoped exercise (4–6 hours) using realistic data from our synthetic generator.
· Final conversation — meet the mentor, discuss scope, align on deliverables and timeline.
Shared expectations across all roles
· Write and review code in Python (primary) or one of Go, Rust, TypeScript, depending on role.
· Work in the open — pull requests with descriptive messages, design docs before non-trivial work, weekly written updates.
· Read papers critically and translate them into running code; bench specific claims against our synthetic and production traces.
· Communicate clearly in writing and in standups; intellectual honesty about what works and what does not.
· Care about correctness, reproducibility, and the user-facing impact of the platform on real bills and real latencies.
A note on what we mean by "intern"
We are looking for people who are still in formal education or within 12 months of graduation. We hire interns who are undergraduate students with strong technical foundations, master's students with deeper specialization, and doctoral students who want a focused industrial experience. The role descriptions are written to scale: an undergraduate intern on Role 1 might focus on the core deliverables and one stretch goal; a doctoral intern might own a stretch goal as their primary track and publish a paper from it. We will calibrate scope to your background during the final conversation.
A final note on culture
This project sits at an unusual intersection applied probability, distributed systems, reinforcement learning, and economics and we hire people who are genuinely curious about all of it, not just the part of it that touches their primary specialization. The four roles depend on each other. The simulator is the RL intern's training environment. The detector is the systems intern's last line of defense. The trained policy is the operations intern's hardest-to-debug failure mode. We expect interns to talk to each other early and often, to read each other's design docs, and to push back on each other's assumptions. Pleasant disagreement, in our experience, is the second-most reliable predictor of good outcomes after technical depth itself.
Click on Apply to know more.