mini-SWE-agent is a lightweight open source CLI coding agent from the SWE-agent team at Princeton University. In ~100 lines of Python it resolves GitHub issues autonomously with 74%+ SWE-bench verified score, and is used at Meta, NVIDIA, IBM, and Stanford.
mini-SWE-agent is a lightweight open source CLI coding agent developed by the SWE-agent team at Princeton University's Natural Language Processing Group. In approximately 100 lines of Python, it achieves a 74%+ verified score on the SWE-bench benchmark — the industry-standard evaluation for autonomous software engineering agents — making it one of the highest-performing minimal agents in existence. Despite its small codebase, mini-SWE-agent resolves real GitHub issues autonomously, supports any LLM with an API, starts faster than comparable tools like Claude Code, and has been adopted in production by teams at Meta, NVIDIA, IBM, and Stanford. It is MIT-licensed, free to use with BYOK (bring your own key), and supports local model execution for offline and privacy-sensitive deployments. For developers who want autonomous issue resolution without the overhead of a large commercial platform, mini-SWE-agent demonstrates that architectural elegance and benchmark performance are not mutually exclusive.
| Feature | mini-SWE-agent | Cursor |
|---|---|---|
| Type | CLI coding agent (autonomous) | AI-powered IDE (VS Code fork) |
| Open source | Yes (MIT) | No |
| Offline / local models | Yes (any local LLM with API) | No |
| BYOK | Yes (any LLM API) | No (managed models) |
| SWE-bench score | 74%+ verified | Not benchmarked on SWE-bench |
| Codebase size | ~100 lines Python | Large proprietary codebase |
| GitHub issue resolution | Yes (autonomous) | Partial (via Agent mode) |
| Multi-file editing | Yes | Yes |
| IDE integration | No (CLI only) | Yes (built-in IDE) |
| Inline autocomplete | No | Yes |
| Startup speed | Faster than Claude Code | Moderate (full IDE load) |
| Pricing | Free (MIT, BYOK) | $20/mo (Pro) |
mini-SWE-agent is best suited for developers, researchers, and engineering teams who want a fast, lightweight, and empirically validated autonomous agent for resolving GitHub issues, running automated code fixes in CI/CD pipelines, or building custom agent workflows on top of a minimal and fully auditable codebase. It's particularly valuable for AI/ML researchers who want a reference implementation of a high-performance SWE agent, for teams building internal automation tools that need an embeddable agent component, and for developers who want maximum model flexibility with a tool that imposes no vendor lock-in and requires no subscription.
Cursor and mini-SWE-agent represent two fundamentally different philosophies in AI-assisted development. Cursor is a full-featured IDE that augments human developers at every keystroke — inline completions, chat, diff review — and is designed for interactive, real-time collaboration between developer and AI. mini-SWE-agent is a focused autonomous tool: you describe a GitHub issue or coding task, and the agent resolves it end-to-end without human intervention in the loop. Cursor is better for developers who want to remain in the driver's seat with AI as a co-pilot. mini-SWE-agent is better for teams who want to hand off well-specified tasks to an autonomous agent and receive completed pull requests. In terms of raw issue-resolution benchmark performance, mini-SWE-agent's 74%+ SWE-bench score provides a concrete, independently verified data point that most commercial tools — including Cursor's agent mode — do not publish equivalent benchmarks for.
mini-SWE-agent is a technically impressive demonstration that high-performance autonomous software engineering does not require complex infrastructure or commercial subscriptions. With a 74%+ SWE-bench verified score, MIT licensing, universal LLM compatibility, and a codebase small enough to read in minutes, it occupies a unique position in the ecosystem: simultaneously a practical automation tool and a research reference implementation. For teams and individuals who prioritize auditability, model flexibility, and empirically verified performance over polished UI, mini-SWE-agent is one of the most compelling options available.
SWE-bench is an evaluation benchmark that tests AI agents on real GitHub issues from 12 popular open source Python repositories (including Django, scikit-learn, and sympy). Each issue has a corresponding test suite; the agent "solves" the issue if its patch makes the failing tests pass. The "verified" subset is a curated, human-validated set of 500 issues where the test quality is confirmed to be reliable. A 74%+ verified score means mini-SWE-agent autonomously resolved more than 74% of these real-world issues correctly — a high bar that represents genuine software engineering capability, not just code generation quality.
mini-SWE-agent is installed via pip (pip install mini-swe-agent) and invoked from the command line. You point it at a local Git repository directory, provide an issue description or task prompt, and specify your LLM API key (or local Ollama endpoint). The agent reads the codebase, plans edits, applies them, runs tests if configured, and outputs the result as a Git diff. Full documentation and examples are available at the GitHub repository.
Yes. Because mini-SWE-agent is a pure CLI tool with standard input/output and a minimal Python footprint, it integrates naturally into GitHub Actions, GitLab CI, Jenkins, or any CI/CD system that can run Python. Teams use it to automatically attempt fixes for failing tests, triage and resolve open issues on a schedule, or run as part of a PR review pipeline where the agent is invoked to suggest or apply fixes based on review comments.
Any model with an OpenAI-compatible API endpoint works with mini-SWE-agent. For local execution via Ollama, models with strong code understanding perform best: Llama 3.1 70B (or larger), Mistral Large, DeepSeek Coder V2, and Code Llama 70B are commonly used. For best SWE-bench-level performance with local models, larger parameter counts (34B+) are recommended. Smaller models (7B-13B) can handle simpler, well-scoped tasks but may struggle with complex multi-file refactors.
Terminal-first AI coding assistant for autonomous development tasks.
Aider is a leading open-source AI pair programming tool that allows you to edit code in your local git repository directly from the terminal or through various community GUIs.
Open-source terminal-based AI coding agent for complex multi-file development tasks.