[bmdpat]

SEC 001 / CONTROLLED COMPUTE

Local-first AI systems for builders running open models on controlled compute._

I publish benchmark rows, failure logs, and build notes for open models running on controlled compute. AgentGuard stays available for budget, loop, timeout, and rate limits. The 5090 Reports start from an RTX 5090 workstation, but the lessons apply to local rigs, rented accelerators, private cloud, and GPU-backed clusters.

AgentGuard stays in the stack when a run needs budget, loop, timeout, or rate limits before it touches real work.

RTX 5090benchmark artifactsfailure logsVRAM fit toolslocal-agent notes
SEC 002 / BENCHMARK EVIDENCE

The lab notebook is the proof surface.

The first public run proved RTX 5090 hardware detection and exposed an Ollama runner timeout before a valid tokens/sec row. That failure stays in the notebook.

Every public claim needs a concrete artifact: benchmark CSV, failure report, architecture diagram, repo note, or cost curve. If a run fails, it stays in the record.

Tokens/sec
Model x quant

Measured on real local-agent prompts, not placeholder demos.

VRAM pressure
Context + cache

What fits, what spills, and what changes after quantization.

Cost curve
Local vs API

Per-workload math for agents that run often enough to matter.

Failure log
Timeouts included

Runner crashes, bad configs, and dead ends stay in the record.

2026-06-12 / benchmark-failed

The 5090 Reports - 2026-06-12

Hardware capture is live. The first bounded Ollama benchmark failed before a valid tokens/sec row, so the public artifact reports the miss instead of inventing a performance claim.

Source: nvidia-smi + Reports/5090/benchmarks

2026-06-12 / failed

5090 Benchmark Failure - gemma4:26b

The Ollama request timed out after 5 seconds with gemma4:26b at num_ctx 1024 and num_predict 16.

Source: Reports/5090/failures/2026-06-12-gemma4-26b.md

SEC 003 / PRODUCT PATH

Local hardware is the lab. Repeated measurements become tools.

The destination is self-serve local-agent infrastructure: observability, memory, MCP, runtime limits, and controlled compute fit.

Phase 0

Instrument the lab

Weekly reports from hardware snapshots, benchmark CSVs, and failure logs.

Phase 1

Distribute artifacts

Three posts per week across LinkedIn, X, and r/LocalLLaMA, all pointing here.

Phase 2

Capped deployments

Inbound-only, async paid R&D for regulated teams that need local AI.

Phase 3

Extract product

Local agent observability, memory, or MCP tooling rebuilt from repeated deployment work.

Operating rules

Publish the model, quant, prompt, hardware, and result.

Run one local experiment per week, even when the result is a failure.

No fake benchmark numbers. A failed run is a valid artifact.

Prefer measured notes over broad claims.

OPERATING LOOP

One person. Small tools. Agent-assisted ops.

01

Run

Execute the open-model path on controlled compute.

02

Measure

Record tokens, latency, VRAM, cost, and failure mode.

03

Publish

Turn the result into a report, tool, or guarded SDK path.

SEC 007 / LAB NOTES

Get the local AI build notes.

Weekly notes from controlled compute: 5090 Reports, failure logs, private-agent deployment notes, and the tools that fall out of repeated local AI work.

Get the local AI lab notes

Benchmark rows, VRAM fit checks, quant choices, and what actually runs on consumer GPUs. M-F, only when there is something worth sending.