AutoResearch
AutoResearch is an agent loop in which an LLM autonomously edits a single training file, runs a fixed 5-minute experiment, checks whether a chosen metric improved, and keeps or reverts the change — repeating overnight. It is a minimal "modify → verify → keep/discard → repeat" harness, not a heavyweight framework.
The name crystallized when Andrej Karpathy open-sourced karpathy/autoresearch on March 7, 2026, pointing it at his own nanochat GPT-2 codebase. Two days, ~700 experiments, ~20 real improvements, and an 11% end-to-end speedup later, the repo had 74k+ stars and the term had been adopted as a generic category for autonomous experiment loops.
Karpathy's reference setup uses three files with strict ownership: `prepare.py` is immutable and handles data plus the `val_bpb` evaluator, `train.py` is the agent's sandbox, and `program.md` is the human-written research brief. The agent proposes a change (e.g. a QK-norm scaler, a banded-attention tweak, an AdamW beta), trains for exactly 5 minutes, and git-reverts anything that doesn't lower validation loss.
Like leaving a junior researcher alone overnight with a stopwatch and one dial — they try things, and only the wins survive.
Search Interest
-
Nascent0–7 days
-
Emergent8–30 days
-
Validating31–90 days
-
Rising ← now91–180 days
-
Established180 days +
Why is it emerging now?
Karpathy open-sourced AutoResearch on March 7, 2026 — 630 lines of Python that let an AI agent run ~100 training experiments a night on a single GPU. Two days of autorun shaved 11% off his already-tuned nanochat GPT-2 pipeline, and Fortune ran the "loopy era" thesis ten days later.
Outlook
6-month signal projection and commercial timeline.
Karpathy-sized mindshare plus a minimal spec means copycats and verticalizations (kernels, SAT, RL) keep compounding the term for 6+ months.
Risk · Skeptics frame it as fancy hyperparameter search; if that critique sticks, the term could fragment into vendor-specific "agent labs" branding.
Analogs · nanoGPT · AutoML · agent loop
-
nowOSS traction, zero SaaS
Karpathy's repo plus copycats; no paid product owns the category yet.
-
3-6moManaged loops land
Expect hosted AutoResearch-as-a-service on SkyPilot, Modal, Together, RunPod.
-
6-12moVerticals fragment term
AutoKernel, auto-SAT, auto-RL siphon searches into domain-specific brands.
Competition & Opportunity for term “AutoResearch”
Three heuristic signals derived from the tracked queries, the term's monetization cards, and its cluster neighbors. Directional, not audited.
Ideas for term “AutoResearch”
Buildable pitches — turn this term into an article, site, product, post, newsletter, video, or course. Steal any card and run with it.
The top HN critique is "this is just hyperparameter search." A clear, code-level explainer of the three differences (arbitrary code edits, sequential memory, full automation) ranks for the skeptics' queries.
Tutorials for running nanochat on consumer GPUs are thin, and "autoresearch karpathy" is already a live Google autocomplete. First clean walkthrough owns the tutorial query.
AutoKernel and agent-SAT already exist. A category piece that collects the non-ML ports and names the pattern wins cross-domain backlinks.
yibie/awesome-autoresearch is a GitHub list, not a browsable site. A search-friendly directory indexed on "autoresearch <domain>" queries has clear SEO headroom.
Upload `train.py` + `program.md`, pick a budget, get a morning report. SkyPilot already has a scaling post; a productized wrapper with cost caps is a one-weekend SaaS.
100 experiments a night means log overload. A web UI that shows kept diffs, reverted diffs, and metric curves is the natural complement to the CLI loop.
First-person experiment posts with real diffs and real regressions travel on HN and X. Few exist outside Karpathy's own nanochat run.
Time-lapse of the agent editing `train.py`, watching val_bpb drop, narrating the wins. Visual-first demo is under-served; Karpathy's own launch tweet is text.
In 630 lines of Python, Karpathy fired the starting gun on autonomous ML research — and the frontier labs are already copying the spec.
The best HN comment under the launch thread wasn't "wow" — it was "isn't this just Bayesian optimization with extra steps?"
AutoKernel for GPU Triton. agent-SAT for solvers. pi-autoresearch for Raspberry Pi. The pattern is eating every optimization problem with a numeric verifier.
What People Search
Long-tail queries from Google Suggest + Trends. Volume and competition are heuristics — directional, not audited. Content Type comes from query shape.
SERP of term “AutoResearch”
What searchers see today — organic results on top, paid ads if anyone's bidding. Ad density is a real-time commercial signal.
FAQ
What is AutoResearch?
AutoResearch is an agent loop in which an LLM autonomously edits a single training file, runs a fixed 5-minute experiment, checks whether a chosen metric improved, and keeps or reverts the change — repeating overnight.
Why is AutoResearch emerging now?
Karpathy open-sourced AutoResearch on March 7, 2026 — 630 lines of Python that let an AI agent run ~100 training experiments a night on a single GPU. Two days of autorun shaved 11% off his already-tuned nanochat GPT-2 pipeline, and Fortune ran the "loopy era" thesis ten days later.
When did AutoResearch emerge?
Publicly emerged around 2026-03-07 (about 101 days ago as of 2026-06-16). EarlyTerms first recorded a pipeline signal on 2026-04-20.
Related Terms
Other terms in the same space — aliases, subtypes, competitors, and neighbors to explore next.
- Part of agent loop An agent loop is the control-flow pattern at the center of every autonomous LLM agent: the model observes its context, reasons about… →
- Related agent harness An agent harness is the middleware between a large language model and the real world — code that runs the agent loop, calls tools,… →
- Related parallel agents Parallel Agents is the pattern of running multiple AI coding sessions at the same time against isolated copies of a codebase, with a… →
- Related coding agents Coding Agents is the category name for AI developer tools that act on code autonomously — reading a repo, planning a change, editing… →
- Related Claude Agent SDK Claude Agent SDK is Anthropic's programmatic toolkit for building AI agents on Claude. →
- Related managed agents Managed Agents is an infrastructure paradigm where cloud platforms host, orchestrate, and operate AI agents as a service. →
- Related agentic coding Agentic coding is the software-development pattern where an autonomous AI agent plans, writes, tests, and iterates on code against a… →
- Includes ·
- Competitor
- Related
Sources
Primary URLs this report cites — open any to verify the claim yourself.
- 01 karpathy/autoresearch — canonical repo github.com ↗
- 02 VentureBeat — Karpathy's open-source autoresearch venturebeat.com ↗
- 03 Fortune — Why everyone is talking about Karpathy's autonomous AI research agent fortune.com ↗
- 04 DataCamp — Guide to AutoResearch datacamp.com ↗
- 05 Hacker News — launch thread (208 pts, Mar 7) news.ycombinator.com ↗
- 06 Hacker News — "Autoresearch on an old research idea" (428 pts, Mar 23) news.ycombinator.com ↗
- 07 SkyPilot blog — Scaling Karpathy's Autoresearch to a GPU cluster blog.skypilot.co ↗
- 08 awesome-autoresearch — ecosystem list github.com ↗