Run a Test Case
Drive a single test case through an agent harness and write a run record. For the full walkthrough, prerequisites, and platform notes see First Time Setup.
Prerequisites
Section titled “Prerequisites”A working setup: a container runtime on PATH (Podman or Docker), the harness
image built, the Chromium browser installed for the
browser driver, and the harness API key exported
or in a .env. See First Time Setup if any of those
are missing.
Run it
Section titled “Run it”From the repository root (so the test-cases/ catalog and browser driver
resolve):
tcab run \ --test-case pong --version v1.0.0 --variant base \ --harness claude --model claude-opus-4-8 \ --out-dir runsFrom a source checkout, substitute cargo run -p test-cabinet-cli -- run … for
tcab run ….
--variantis required: a run targets exactly one variant.--modelis passed to the harness unchanged; it is opaque to The Test Cabinet.--max-runtime <seconds>overrides the case’smax_runtime_secondsfor this invocation only.
The run renders the references, seeds a fresh repository, drives the harness in a
container while printing the live event stream, then
validates and writes
runs/<id>/run-record.json alongside a copy of the implementation.
Inspect inputs without a run
Section titled “Inspect inputs without a run”tcab prompt --test-case pong --version v1.0.0 --variant base # the rendered prompttcab seed --test-case pong --version v1.0.0 --variant base # the seeded repo, on disktcab harnesses # harness availabilitySee the CLI overview for every subcommand.
Next steps
Section titled “Next steps”- Review a Run once it finishes.
- Reviewing Test Run Results for the full review workflow.