AI AgentsMachine LearningConsumer GPUAutonomous ResearchPython

How I Let an AI Agent Run 100 ML Experiments Overnight on a $500 GPU

I built an autonomous research agent that proposes neural network changes, trains models, evaluates results, and iterates — all while I sleep. Here's what happened.

Patrick Hughes

March 20, 2026

8 min read

Share: LinkedIn Twitter

How I Let an AI Agent Run 100 ML Experiments Overnight on a $500 GPU

Last week I let an AI agent run 100 machine learning experiments overnight on my RTX 3070. I woke up to a 25% model improvement. Here's exactly how it works.

The Setup

The agent is built on Karpathy's autoresearch concept, powered by Claude Sonnet. It runs in a loop:

Propose — The agent analyzes current model performance and proposes a specific code change
Implement — It writes the actual Python code to modify the neural network
Train — The modified model trains on PubMed medical text data
Evaluate — Loss metrics are compared against the baseline
Decide — If improvement > threshold, keep the change. Otherwise, revert.
Repeat — Go back to step 1 with updated context

The Results

Out of 100 experiments:

93 failed — proposed changes made the model worse or had no effect
7 succeeded — measurable improvements that the agent kept
Net result — 25% improvement in model performance

The 7% hit rate sounds low, but that's the point. Research is mostly failure. The agent runs experiments I'd never have time to try manually.

What the Agent Discovered

The 7 successful experiments included:

Learning rate scheduling changes I wouldn't have tried
A specific attention head configuration that improved convergence
Batch size adjustments that were counterintuitive but worked
Layer normalization placement that contradicted my assumptions

The Hardware

This runs on consumer hardware:

GPU: NVIDIA RTX 3070 (8GB VRAM) — ~$500
CPU: Standard desktop AMD Ryzen
RAM: 32GB
Storage: 1TB NVMe SSD

Total cost for the overnight run: about $0.50 in electricity + Claude API calls.

Why This Matters

The traditional ML research loop is: human thinks of experiment → human implements it → human waits for training → human evaluates → human thinks of next experiment.

Each cycle takes hours or days of human attention. My agent does it in minutes and runs 24/7.

The Code

The agent is ~300 lines of Python orchestrating:

Claude Sonnet for reasoning and code generation
PyTorch for training
A simple SQLite database tracking all experiments
Git for version control of each experiment

It's not magic. It's a loop with good prompts and clear evaluation criteria.

What I Learned

Autonomy requires clear metrics — the agent needs an unambiguous way to measure success
Failure is the feature — 93% failure rate is fine when experiments are cheap
Consumer hardware is enough — you don't need cloud GPUs for meaningful research
Overnight is the killer use case — run experiments while you sleep, review results over coffee

Try It Yourself

You need:

A GPU (even a 3060 works)
An API key for Claude or GPT
A clear metric to optimize
Patience to debug the loop

The hardest part isn't the code — it's defining what "better" means for your specific model.

Want me to build an autonomous agent for your workflow? Start a project →

Ready to automate?

I build AI agents and automated workflows. Async delivery. No meetings. Flat rate.

Start a Project

Get new posts delivered to your inbox

No spam. Unsubscribe anytime.

More from the blog

Mar 23, 20267 min

OpenClaw Has 250K GitHub Stars — But Should Your Business Actually Use It?

OpenClaw is the hottest open-source AI agent tool in 2026. But there's a gap between cool demo and production business automation. Here's when OpenClaw makes sense — and when you need something custom.

AI AgentsOpenClawOpen Source

Mar 22, 20268 min

Multi-Agent AI Systems for Business: What They Are and When You Actually Need One

Single AI agents hit a ceiling fast. Multi-agent systems let specialized agents collaborate on complex workflows — here's how they work, when they make sense, and how to build one without a six-figure budget.

AI AgentsMulti-Agent SystemsAutomation

Mar 20, 20269 min

How Much Does It Cost to Build an AI Agent in 2026?

A transparent breakdown of what AI agent development actually costs — from simple automations to complex autonomous systems. Real numbers, not consulting-speak.

AI AgentsPricingCost