Junior Python Engineer – AI Agent Task Designer

Evolvable.ai

Location: India
Job type: Full-time

Required skills

Python
DevOps
Docker
Linux
macOS
Shell Scripting

About the role

Website: evolvable.ai
Job details:
The Opportunity:

We are seeking talented and imaginative Junior Python Engineers – AI Agent Task Designers to join our dynamic team. The ideal candidate will be passionate about designing creative and challenging terminal-based tasks for AI agents, combining strong Python, DevOps and Scripting skills with a flair for storytelling, problem design, and systems thinking.

This role sits at the intersection of software engineering and AI research. You will write real code, design real environments, and directly influence how state-of-the-art AI systems learn to solve complex tasks.

Key Responsibilities:

Design and implement creative, terminal-based tasks that AI agents can solve through command-line interfaces.
Author clear, imaginative challenges such as scripting workflows, data processing pipelines, environment setup tasks, and service orchestration problems.
Build tasks that require agents to reason, plan, debug, and iterate using public tools (web search, Python libraries, CLIs, system utilities, etc.).
Leverage the open-source Terminal-Bench benchmark (tbench.ai) for task structure, inspiration, and evaluation tooling.
Write Python scripts and shell logic to define task instructions, environment setup, orchestration logic, and automated success checks.
Tune task difficulty, ambiguity, and feedback signals to balance solvability with meaningful challenge.
Collaborate closely with AI researchers and engineers to iterate on task design based on agent performance and training outcomes.

Impact of Your Work:

The tasks you create will be used by frontier AI labs to train large language models using reinforcement learning.
Your environments and evaluations will directly shape how advanced LLMs learn to:
Solve long-horizon, multi-step problems
Interact reliably with real tools and operating systems
Recover from errors, debug failures, and adapt strategies
Your work will contribute to making next-generation AI systems more capable, robust, and skillful at difficult real-world tasks.
This role offers hands-on exposure to the full loop: task design → agent interaction → evaluation → learning signal.

Qualifications:

Currently pursuing or recently completed a Bachelor's or Master's degree in Computer Science, Engineering, or a related scientific field.
Strong hands-on experience with Python, including writing scripts, utilities, and small systems.
Comfort working in terminal environments (Linux/macOS shells, CLI tools).
Familiarity with large language models and prompting techniques (e.g. GPT, Claude).
Interest in AI agents, reinforcement learning, evaluation benchmarks, or emergent model behavior.
Ability to think both like an engineer (systems, edge cases, correctness) and like a researcher (difficulty curves, generalization, signal quality).

Nice-to-Haves:

Experience with shell scripting, Docker, or lightweight infrastructure tooling.
Prior exposure to agent benchmarks, evaluation frameworks, or reinforcement learning concepts.
A creative mindset that enjoys designing puzzles, challenges, and environments that are fun, engaging, and meaningful.
Curiosity about how intelligent systems fail — and how to design tasks that push them to improve.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.