Introduction
We propose Reptile, a terminal agent that operates under an extended REPL (Read-Execute-Print-Learn Loop) protocol, where human feedback is seamlessly integrated into the agent’s execution loop.
Unlike traditional REPL (Read-Execute-Print Loop) environments that focus solely on code evaluation, our REPL protocol emphasizes the iterative cycle of human-agent collaboration, transforming the terminal from a passive command executor into an interactive learning environment.

What Makes Reptile Special?
Compared with other CLI agents (e.g., Claude Code and Mini SWE-Agent), Reptile stands out for the following reasons:
Terminal-only beyond Bash-only: Simple and stateful execution, which is more efficient than bash-only (you don’t need to specify the environment in every command). It doesn’t require the complicated MCP protocol—just a naive bash tool under the REPL protocol.
Human-in-the-Loop Learning: Users can inspect every step and provide prompt feedback, i.e., give feedback under the USER role or edit the LLM generation under the ASSISTANT role.
As noted in the post from the Mini SWE-Agent team, implementing stateful shell sessions presents significant challenges. We address this challenge by detecting the non-canonical mode of TTY.
Terminal UI (autopilot run) | Web UI (autopilot gradio) |
|
|
Batch Evaluation (autopilot evaluate) | Trajectory Viewer |
|
|
This blog focus on workflow and benchmarking.
See TTY-use blog for technical details on how to make terminal backend work.
See on-policy annotation blog for annotation details on SWE tasks.
Our Insights in Building General Agents
Workflow: Build the universal action space for the LLM, reserving specialized workflows only for high-risk operations.
Evaluation: Focus on learning efficiency on meta-actions like inspecting file in right way, besides end2end benchmark, which makes the optimization more trackable.
Annotation: Correct Agent’s behaviour with clever annotation (like use PDB Debugging for coding), which enjoys stateful re-run and on-policy prediction.
First Target Milestone

Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Why Terminal Agent?
Promising testbed for Agent Learning
- Wide Applicability: Spans everyday tasks to professional workflows (software engineering, DevOps, containerization) through a single universal interface.
- Native LLM Compatibility: Terminal protocols are inherently understood through pretraining—no prompt engineering needed, unlike heavyweight protocols like MCP.
- Core Research Challenges: Naturally encompasses long-horizon reasoning, context management, error recovery, and compositional tool use.
Native Universal Protocol
The Unix terminal has always been the universal text interface between human and machine. We believe it can serve the same role for AI agents.
At its core, the terminal is a text-based REPL protocol with half a century of history and refinement:
- Interpreters:
bash,python,node,perl,rubyand countless others - Debuggers:
gdb,pdb,lldbfor interactive debugging - Development tools:
git,make,dockerfor workflows - System utilities: Thousands of battle-tested Unix tools
This mature ecosystem means agents can leverage decades of tooling without reinventing the wheel.
TTY Implementation Details
We build our LLM agent with https://github.com/sail-sg/tty-use, it implements the REPL protocol to use terminal interactively. A key challenge we solve is how to detect that the foreground process has finished its current job and waiting for next interaction.
For technique details in tty-use, please see our post at https://terminal-agent.github.io/blog/tool/ and https://x.com/mavenlin/status/1977758827366817929.
How REPL enables Human-in-the-loop Learning?
Human-in-the-loop isn’t just for runtime—it’s also central to our data collection strategy for further model training.

Please refer to https://reptile.github.io/blog/annotation/ for more annotation details and cases.
Data Collection Workflow

Our data pipeline source
- All branches are automatically logged with a checkpointing hook.
- User approval / disapproval means something.
- LLM after feedback > LLM before feedback.
- LLM after edit > LLM before edit.
- Feedback & edit are natural to user.
- The more you use, the more data you generate to make the model more like you.
Usage of the data
- Supervised finetuning.
- Preference optimization.
- RLHF (use the data to train reward model then RL)
Benchmarking

We annotate tasks on SWEgym. After training with 200 interactions, this improves Devstral-2505-22B performance:
- Terminal-bench: 11.3% -> 18.9%
- SWE-Bench-Verified: 18.6 -> 32.8%
Looking Forward
We are actively working on several exciting directions:
1. Terminal Gym for RL Training
We aim to build an Terminal Gym that provides a structured environment for reinforcement learning. This includes (1) procise reward modeling (2) robust and scalable dockerized envs (3) easy-to-hard task sets.
2. Advanced Learning Algorithms
We are exploring offline RL, imitation learning, and other techniques to improve sample efficiency for extra-long agent trajectories (>30K length) and ultimately reduce the need for human supervision.
Open Source & Community
Reptile is open source and we welcome contributions! Whether you’re interested in:
- Adding new benchmarks and evaluation tasks
- Improving the hook system
- Contributing training data
- Building integrations with other tools like training/inference backends
- Research discussion and resource collaboration
Visit our GitHub repository: https://github.com/terminal-agent/reptile
We are inspired by excellent community work such as terminal-bench and mini-SWE-agent. We thank the community for their efforts and valuable insights!
Conclusion
The terminal has been humanity’s interface to computers for 50 years. With Reptile, it becomes the interface between humans and AI agents. Reptile represents a new paradigm for terminal agents: one that embraces human collaboration rather than trying to eliminate it.
By extending the familiar REPL protocol with a learning layer, we create a system that:
- Leverages the mature Unix ecosystem without reinvention
- Provides transparency and control through human-in-the-loop interaction
- Scales naturally to complex, multi-step tasks
Citation
If you find Reptile useful in your research or applications, please cite:
@misc{reptile2025,
title={Reptile: Terminal-Agent with Human-in-the-loop Learning},
author={Dou, Longxu and Du, Cunxiao and Li, Shenggui and Wang, Tianduo and Zhang, Tianjie and Liu, Tianyu and Chen, Xianwei and Tang, Chenxia and Zhao, Yuanheng and Lin, Min},
year={2025},
url={https://github.com/terminal-agent/reptile},
note={GitHub repository}
}
Fun fact: The name “Reptile” has a dual meaning: it refers to the REPL (Read-Eval-Print-Learning Loop) workflow in terminal interactions, and also pays homage to OpenAI’s Reptile meta-learning algorithm (2018), which pioneered few-shot adaptation. Like its namesake, our Reptile learns to quickly adapt to new tasks—but through human-in-the-loop collaboration rather than pure algorithmic optimization. Both share the same philosophy: learning efficiently from minimal examples to master diverse tasks.
Reference: On First-Order Meta-Learning Algorithms



