Skip to content

Running

This page covers running The Test Cabinet locally — on your own machine, for development or to exercise the whole flow end to end. Two shapes of “running” are worth separating, because they need very different amounts of setup:

  • A single run, driven by the CLI (tcab) or the Tauri desktop app. Both embed the core runner directly, so they need no backend or worker process — just a container runtime and a harness API key. This is the fastest way to launch one run; the quickstarts walk through it and Building covers producing the binaries.
  • The full service-driven flow — the backend, a worker, and the web console running as their own processes, exactly as a deployed environment runs them, just all on localhost. This is the environment to reach for when developing or debugging the services themselves, and it is what the rest of this page sets up.

Running the services on one machine is the local mirror of a real deployment: the same binaries and the same configuration, only bound to localhost. When you are ready to put them on real hosts — staging and prod — see Deployment.

  • A container runtime (Docker or Podman) on the host — the worker needs it to execute runs. See Execution and first-time setup.
  • The harness container images built or pullable for whichever harness you intend to run.
  • The two service binaries, built per Building: cargo build -p test-cabinet-backend and cargo build -p test-cabinet-worker (or the build-portable-* aliases for a static binary). The web console is a Vite app under apps/web.
  • A harness API key for the harness you will run (for example ANTHROPIC_API_KEY for claude).

A natural instinct is to put everything in one docker compose stack. The backend is happy in a container, but the worker starts a container per run, so containerizing it means giving it access to the host’s container runtime (bind-mounting the Docker socket) and ensuring the run’s work directory is a path the host shares — the nested run containers are started by the host’s daemon, so TCAB_WORK_DIR must resolve to the same path on the host, not just inside the worker container. That is the same caveat the .env.worker.example flags for macOS/Windows.

To keep the moving parts obvious, run the worker directly on the host and only optionally containerize the backend. The deployments/local/compose.yml template brings the backend up in a container with a local volume for its state; the worker stays a host process throughout.

Copy the repo-root example env files and fill them in. These remain the authoritative list of every variable each service reads.

Terminal window
cp .env.backend.example .env.backend
cp .env.worker.example .env.worker

In .env.backend, the only required value is the checkout the backend ingests definitions from — point it at this repository:

Terminal window
TCAB_BACKEND_CHECKOUT=/absolute/path/to/the-test-cabinet
# Leave TCAB_BACKEND_BIND at its default 127.0.0.1:8787 for local use.
# Leave TCAB_BACKEND_DATABASE_URL unset to use the default local SQLite file.
# R2 + deploy-hook variables can stay blank: with them unset the backend still
# records to its database and regenerates the snapshot on disk (a dev-only mode).

In .env.worker, point the worker at the local backend and provide the harness key for whatever you will run:

Terminal window
TCAB_BACKEND_URL=http://127.0.0.1:8787
# Leave TCAB_WORKER_BIND at its default 127.0.0.1:8788.
ANTHROPIC_API_KEY=sk-ant-...

Either run the binary directly from a directory containing .env.backend:

Terminal window
./target/debug/tcab-backend

or bring it up with the compose template, which mounts a local volume for the default SQLite database and the definition store so they survive a restart:

Terminal window
docker compose -f deployments/local/compose.yml up backend

Once it is up, ingest the repository so the catalog is populated:

Terminal window
curl -X POST http://127.0.0.1:8787/ingest

Confirm it is serving with curl http://127.0.0.1:8787/healthz and curl http://127.0.0.1:8787/test-cases.

From a directory containing .env.worker, on the host:

Terminal window
./target/debug/tcab-worker

It reads TCAB_BACKEND_URL, resolves definitions from the backend you just started, and binds 127.0.0.1:8788. Check curl http://127.0.0.1:8788/healthz — the response reports the worker’s identity and the backend it is bound to, which is a quick way to confirm the two agree.

Run the console’s dev server and open it in a browser:

Terminal window
npm run dev -w @test-cabinet/web

In the UI, set the backend to http://127.0.0.1:8787 and add the worker at http://127.0.0.1:8788. The console verifies the worker is bound to the same backend before it will launch runs on it. From there you can launch a run on the local worker, watch its event stream live, and review the result exactly as you would against a remote environment.

To watch traces across tcab-backendtcab-worker locally, enable the bundled Grafana LGTM stack and point each process at it. That is fully described under Observability — note the endpoint-duality rule: a backend running inside the devcontainer uses http://lgtm:4318, while a worker on the host uses http://localhost:4318. Leaving OTEL_EXPORTER_OTLP_ENDPOINT unset keeps both on plain stdout logging.

When this works end to end, the same two binaries deploy unchanged to staging and prod on Azure — what changes is where they bind and how they are supervised, not how they are configured. See Deployment for the remote build.