{"kind":"Skill","metadata":{"namespace":"community","name":"openclaw-testing","version":"0.1.0"},"spec":{"description":"Choose, run, rerun, or debug OpenClaw tests, CI checks, Docker E2E lanes, release validation, and the cheapest safe verification path.","files":{"SKILL.md":"---\nname: openclaw-testing\ndescription: Choose, run, rerun, or debug OpenClaw tests, CI checks, Docker E2E lanes, release validation, and the cheapest safe verification path.\n---\n\n# OpenClaw Testing\n\nUse this skill when deciding what to test, debugging failures, rerunning CI,\nor validating a change without wasting hours.\n\n## Read First\n\n- `docs/reference/test.md` for local test commands.\n- `docs/ci.md` for CI scope, release checks, Docker chunks, and runner behavior.\n- Scoped `AGENTS.md` files before editing code under a subtree.\n\n## Default Rule\n\nProve the touched surface first. Do not reflexively run the whole suite.\n\n1. Inspect the diff and classify the touched surface:\n   - normal source checkout, source change: `pnpm changed:lanes --json`, then `pnpm check:changed`\n   - normal source checkout, tests only: `pnpm test:changed`\n   - normal source checkout, one failing file: `pnpm test \u003cpath-or-filter\u003e -- --reporter=verbose`\n   - Codex worktree or linked/sparse checkout, one/few explicit files: `node scripts/run-vitest.mjs \u003cpath-or-filter\u003e`\n   - Codex worktree or linked/sparse checkout, changed gates or anything broad:\n     use the Crabbox wrapper with the provider that matches the proof surface.\n     For maintainer heavy `pnpm` gates, that is usually delegated Blacksmith\n     Testbox through Crabbox, e.g. `node scripts/crabbox-wrapper.mjs run\n--provider blacksmith-testbox ... -- pnpm check:changed`. For direct AWS\n     Crabbox proof, omit `--provider` and let `.crabbox.yaml` choose AWS.\n   - workflow-only: `git diff --check`, workflow syntax/lint (`actionlint` when available)\n   - docs-only: `pnpm docs:list`, docs formatter/lint only if docs tooling changed or requested\n2. Reproduce narrowly before fixing.\n3. Fix root cause.\n4. Rerun the same narrow proof.\n5. Broaden only when the touched contract demands it.\n\n## Guardrails\n\n- Do not kill unrelated processes or tests. If something is running elsewhere, treat it as owned by the user or another agent.\n- Do not run expensive local Docker, full release checks, full `pnpm test`, or full `pnpm check` unless the user asks or the change genuinely requires it.\n- Prefer GitHub Actions for release/Docker proof when the workflow already has the prepared image and secrets.\n- Use `scripts/committer \"\u003cmsg\u003e\" \u003cpaths...\u003e` when committing; stage only your files.\n- If deps are missing, run `pnpm install`, retry once, then report the first actionable error.\n- In a Codex worktree or linked/sparse checkout, do not run direct local\n  `pnpm test*`, `pnpm check*`, `pnpm crabbox:run`, or `scripts/committer` until\n  you have verified pnpm will not reconcile or reinstall dependencies. Use\n  `node scripts/run-vitest.mjs` for tiny local proof, `node\nscripts/crabbox-wrapper.mjs` for Testbox, and `git commit --no-verify` only\n  after the relevant remote or node-wrapper proof is already clean.\n- For remote proof, use the Crabbox wrapper first, but name the actual backend.\n  Direct AWS Crabbox uses `provider=aws` and `cbx_...` ids. Delegated\n  Blacksmith Testbox through Crabbox uses `provider=blacksmith-testbox`,\n  `syncDelegated=true`, and `tbx_...` ids. Both satisfy \"remote proof\" when the\n  requested proof surface allows either.\n- Do not infer \"no Testbox is running\" from plain `blacksmith testbox list`.\n  Use `blacksmith testbox list --all` or `blacksmith testbox status \u003ctbx_id\u003e`\n  before reporting cloud state.\n- Reuse only an id/slug created in this operator session unless explicitly\n  coordinating with another lane. If Testbox queues, fails capacity, or cannot\n  allocate, report the blocker or switch to direct AWS Crabbox only when that\n  still proves the requested surface.\n\n## Local Test Shortcuts\n\n```bash\npnpm changed:lanes --json\npnpm check:changed       # changed typecheck/lint/guards; no Vitest\npnpm test:changed        # cheap smart changed Vitest targets\nOPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed\npnpm test \u003cpath-or-filter\u003e -- --reporter=verbose\nOPENCLAW_VITEST_MAX_WORKERS=1 pnpm test \u003cpath-or-filter\u003e\n```\n\nUse targeted file paths whenever possible. Avoid raw `vitest`; use the repo\n`pnpm test` wrapper so project routing, workers, and setup stay correct.\nWhen the checkout is a Codex worktree, prefer the direct node harness instead:\n\n```bash\nnode scripts/run-vitest.mjs \u003cpath-or-filter\u003e\n```\n\nThat keeps the test scoped without giving pnpm a chance to run dependency\nstatus checks or install reconciliation in a linked worktree.\n\n## Command Semantics\n\n- `pnpm check` and `pnpm check:changed` do not run Vitest tests. They are for\n  typecheck, lint, and guard proof.\n- `pnpm test` and `pnpm test:changed` run Vitest tests.\n- `pnpm test:changed` is intentionally cheap by default: direct test edits,\n  sibling tests, explicit source mappings, and import-graph dependents.\n- `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` is the explicit broad\n  fallback for harness/config/package edits that genuinely need it.\n- Do not run extension sweeps just because core changed. If a core edit is for a\n  specific plugin bug, run that plugin's tests explicitly. If a public SDK or\n  contract change needs consumer proof, choose the smallest representative\n  plugin/contract tests first, then broaden only when the risk justifies it.\n- The test wrapper prints a short `[test] passed|failed|skipped ... in ...`\n  line. Vitest's own duration is still the per-shard detail.\n\n## Routing Model\n\n- `pnpm changed:lanes --json` answers \"which check lanes does this diff touch?\"\n  It is used by `pnpm check:changed` for typecheck/lint/guard selection.\n- `pnpm test:changed` answers \"which Vitest targets are worth running now?\" It\n  uses the same changed path list, but applies a cheaper test-target resolver.\n- Direct test edits run themselves. Source edits prefer explicit mappings,\n  sibling `*.test.ts`, then import-graph dependents. Shared harness/config/root\n  edits are skipped by default unless they have precise mapped tests.\n- Shared group-room delivery config and source-reply prompt edits are precise\n  mapped tests: they run the core auto-reply regressions plus Discord and Slack\n  delivery tests so cross-channel default changes fail before a PR push.\n- Public SDK or contract edits do not automatically run every plugin test.\n  `check:changed` proves extension type contracts; the agent chooses the\n  smallest plugin/contract Vitest proof that matches the actual risk.\n- Use `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` only when a harness,\n  config, package, or unknown-root edit really needs the broad Vitest fallback.\n\n## CI Debugging\n\nStart with current run state, not logs for everything:\n\n```bash\ngh run list --branch main --limit 10\ngh run view \u003crun-id\u003e --json status,conclusion,headSha,url,jobs\ngh run view \u003crun-id\u003e --job \u003cjob-id\u003e --log\n```\n\n- Check exact SHA. Ignore newer unrelated `main` unless asked.\n- For cancelled same-branch runs, confirm whether a newer run superseded it.\n- Fetch full logs only for failed or relevant jobs.\n- Prefer `gh run view \u003crun-id\u003e --json jobs` over PR rollup while debugging; rollup can be stale/noisy.\n- For `prompt:snapshots:check` failures, treat Linux Node 24 as CI truth. If macOS passes but CI drifts, reproduce in a Linux Node 24 container or Testbox, commit that generated output, then rerun.\n\n## GitHub Release Workflows\n\nUse the smallest workflow that proves the current risk. The full umbrella is\navailable, but it is usually the last step after narrower proof, not the first\nrerun after a focused patch.\n\n### Full Release Validation\n\n`Full Release Validation` (`.github/workflows/full-release-validation.yml`) is\nthe manual \"everything before release\" umbrella. It resolves a target ref, then\ndispatches:\n\n- manual `CI` for the full normal CI graph, with Android enabled via\n  `include_android=true`\n- `Plugin Prerelease` for release-only plugin static checks, extension shards,\n  the release-only `agentic-plugins` shard, and plugin product Docker lanes\n- `OpenClaw Release Checks` for install smoke, cross-OS release checks, live and\n  E2E checks, Docker release-path suites, OpenWebUI, QA Lab, fast Matrix, and\n  Telegram release lanes\n- optional post-publish Telegram E2E when a package spec is supplied\n\nRun it only when validating an actual release candidate, after broad shared CI\nor release orchestration changes, or when explicitly asked:\n\n```bash\ngh workflow run full-release-validation.yml \\\n  --repo openclaw/openclaw \\\n  --ref main \\\n  -f ref=\u003cbranch-or-sha\u003e \\\n  -f provider=openai \\\n  -f mode=both \\\n  -f release_profile=stable\n```\n\nRun the workflow itself from the trusted current ref, normally `--ref main`;\nchild workflows are dispatched from that same ref even when `ref` points at an\nolder release branch or tag. Full Release Validation has no separate child\nworkflow ref input; choose the trusted harness by choosing the workflow run ref.\nUse `release_profile=minimum|stable|full` to control live/provider breadth:\n`minimum` keeps the fastest OpenAI/core release-critical set, `stable` adds the\nstable provider/backend set, and `full` adds the broad advisory provider/media\nmatrix. Do not make `full` faster by silently dropping suites; optimize setup,\nartifact reuse, and sharding instead. The parent verifier job appends a child\noverview plus slowest-job tables for child runs; rerun only that verifier after\na child rerun turns green.\n\nStandalone manual `CI` dispatches do not run the plugin prerelease suite, the\nextension batch sweep, or the release-only `agentic-plugins` Vitest shard. Those\nlanes are intentionally reserved for the separate `Plugin Prerelease` child so\nPRs, main pushes, and ad hoc broad CI checks do not spend Docker/package time or\nall-plugin runtime time on release-only product coverage.\n\nIf a full run is already active on a newer `origin/main`, prefer watching that\nrun over dispatching a duplicate. Do not cancel release, release-check, or child\nworkflow runs unless Peter explicitly asks for cancellation.\n\nThe child-dispatch jobs record the child run ids. The final\n`Verify full validation` job re-queries those child runs and is the canonical\nparent gate. If a child workflow failed but was later rerun successfully, rerun\nonly the failed parent verifier job; do not dispatch a new full umbrella unless\nthe release evidence is stale.\n\nFor bounded recovery after a focused fix, pass `-f rerun_group=\u003cgroup\u003e`.\nSupported umbrella groups are `all`, `ci`, `plugin-prerelease`,\n`release-checks`, `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`,\n`qa-parity`, `qa-live`, and `npm-telegram`. Use the narrowest group that covers\nthe failed box. After a targeted release-check fix, do not restart the full\numbrella by habit: dispatch the matching `rerun_group` and rerun only the parent\nverifier/evidence step after the child is green unless the release evidence is\nstale. For a single failed live/E2E shard, use\n`-f rerun_group=live-e2e -f live_suite_filter=\u003csuite_id\u003e` so the Blacksmith\nworkflow only spends setup and queue time on that suite.\n\n### Release Evidence\n\nAfter release-candidate validation or before a release decision, record the\nimportant run ids in the private `openclaw/releases-private` evidence ledger.\nUse the manual `OpenClaw Release Evidence`\n(`openclaw-release-evidence.yml`) workflow there. It writes durable summaries\nunder `evidence/\u003crelease-id\u003e/` and commits:\n\n- `release-evidence.md`\n- `release-evidence.json`\n- `index.json`\n- `runs/\u003clabel\u003e.json`\n\nUse one run per line:\n\n```text\nfull-release-validation openclaw/openclaw \u003crun-id\u003e blocking\npackage-acceptance openclaw/openclaw \u003crun-id\u003e blocking\nrelease-checks openclaw/openclaw \u003crun-id\u003e blocking\n```\n\nStore summaries, run URLs, artifact metadata, timings, pass/fail state, and\nshort release-manager notes there. Do not store raw logs, provider\nprompts/responses, channel transcripts, signing material, or secret-bearing\nconfig in git; raw logs stay in Actions artifacts.\n\nWhen `Full Release Validation` completes and\n`OPENCLAW_RELEASES_PRIVATE_DISPATCH_TOKEN` is configured in the public repo, it\nrequests the private `OpenClaw Release Evidence From Full Validation` workflow.\nThat private workflow reads the parent full-validation run, extracts the child\nCI/release-checks/Telegram run ids from the parent logs, and opens the evidence\nPR automatically. If the token is absent or the run predates this wiring, trigger\nthat private workflow manually with the full-validation run id.\n\n### Release Checks\n\n`OpenClaw Release Checks` (`openclaw-release-checks.yml`) is the release child\nworkflow. It is broader than normal CI but narrower than the umbrella because it\ndoes not dispatch the separate full normal CI child. It runs Package Acceptance\nwith artifact-native delta lanes and `telegram_mode=mock-openai`, so the release\npackage tarball also goes through offline plugin proof, bundled-channel compat,\nand Telegram package QA. The Docker release-path chunks cover the overlapping\npackage/update/plugin lanes. Use it when release-path validation is needed\nwithout rerunning the entire umbrella.\n\n```bash\ngh workflow run openclaw-release-checks.yml \\\n  --repo openclaw/openclaw \\\n  --ref main \\\n  -f ref=\u003cbranch-or-sha\u003e \\\n  -f provider=openai \\\n  -f mode=both \\\n  -f release_profile=stable \\\n  -f rerun_group=all\n```\n\nRelease-check rerun groups are `all`, `install-smoke`, `cross-os`, `live-e2e`,\n`package`, `qa`, `qa-parity`, and `qa-live`.\n`OpenClaw Release Checks` uses the trusted workflow ref to resolve the selected\nref once as `release-package-under-test` and passes that artifact into cross-OS\nrelease checks, release-path Docker live/E2E checks, and Package Acceptance.\nWhen `Full Release Validation` dispatches release checks, it passes the requested\nbranch/tag plus an `expected_sha` so branch/tag refs resolve through the fast\nremote-ref path while the package and QA jobs still validate the exact SHA.\n\nThe full install-smoke child is split on purpose: one job prepares or reuses the\ntarget-SHA GHCR root Dockerfile smoke image, QR package install runs in its own\njob, root Dockerfile/gateway smokes pull the prepared image, and installer/Bun\nsmokes pull the same image while building only their small installer images.\nIf install-smoke gets slow again, first check whether the root image was reused\nor rebuilt before adding/removing coverage.\n\nThe full-profile native live media shards use the prebuilt\n`ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04` container so\n`ffmpeg`/`ffprobe` are already present. If those jobs suddenly spend minutes in\ndependency setup again, first check the `Live Media Runner Image` workflow and\nthe `Verify preinstalled live media dependencies` step before assuming the media\ntests themselves slowed down.\n\nThe release Docker path intentionally shards the plugin/runtime tail. The\nworkflow uses `plugins-runtime-plugins`, `plugins-runtime-services`, and\n`plugins-runtime-install-a` through `plugins-runtime-install-d`; aggregate\naliases such as `plugins-runtime-core`, `plugins-runtime`, and\n`plugins-integrations` remain for manual reruns.\n\nThe release QA parity box is internally split into candidate and baseline lane\njobs, followed by a report job that downloads both artifacts and runs\n`pnpm openclaw qa parity-report`. For parity failures, inspect the failed lane\nfirst; inspect the report job when both lane summaries exist but the comparison\nfails.\n\n### QA Lab Matrix Profiles\n\n`pnpm openclaw qa matrix` defaults to `--profile all`. Do not assume the CLI\ndefault is the fast release path. Use explicit profiles:\n\n- `--profile fast`: release-critical Matrix transport contract; add\n  `--fail-fast` only when the target CLI supports it\n- `--profile transport|media|e2ee-smoke|e2ee-deep|e2ee-cli`: sharded full\n  Matrix proof\n- `OPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000`: CI-friendly no-reply quiet\n  window when paired with fast or sharded gates\n\n`QA-Lab - All Lanes` uses explicit fast Matrix on scheduled runs; manual\ndispatch keeps `matrix_profile=all` as the default and always shards that full\nMatrix selection. `OpenClaw Release Checks` uses explicit fast Matrix; run the\nall-lanes workflow when release investigation needs full Matrix media/E2EE\ninventory.\n\n### Reusable Live/E2E Checks\n\n`OpenClaw Live And E2E Checks (Reusable)`\n(`openclaw-live-and-e2e-checks-reusable.yml`) is the preferred entry point for\ntargeted live, Docker, model, and E2E proof. Inputs let you turn off unrelated\nlanes:\n\n```bash\ngh workflow run openclaw-live-and-e2e-checks-reusable.yml \\\n  --repo openclaw/openclaw \\\n  --ref main \\\n  -f ref=\u003csha\u003e \\\n  -f include_repo_e2e=false \\\n  -f include_release_path_suites=false \\\n  -f include_openwebui=false \\\n  -f include_live_suites=true \\\n  -f live_models_only=true \\\n  -f live_model_providers=fireworks\n```\n\nUseful knobs:\n\n- `docker_lanes='\u003clane[,lane]\u003e'`: run selected Docker scheduler lanes against\n  prepared artifacts instead of the release chunk matrix. Multiple selected\n  lanes fan out as parallel targeted Docker jobs after one shared package/image\n  preparation step.\n- `include_live_suites=false`: skip live/provider suites when testing Docker\n  scheduler or release packaging only.\n- `live_models_only=true`: run only Docker live model coverage.\n- `live_model_providers=fireworks` (or comma/space separated providers): run one\n  targeted Docker live model job instead of the full provider matrix.\n- blank `live_model_providers`: run the full live-model provider matrix.\n\nRelease-path Docker chunks are currently `core`, `package-update-openai`,\n`package-update-anthropic`, `package-update-core`,\n`plugins-runtime-plugins`, `plugins-runtime-services`,\n`plugins-runtime-install-a`, `plugins-runtime-install-b`,\n`plugins-runtime-install-c`, `plugins-runtime-install-d`,\n`bundled-channels-core`, `bundled-channels-update-a`,\n`bundled-channels-update-b`, and `bundled-channels-contracts`. The aggregate\n`bundled-channels`, `plugins-runtime-core`, `plugins-runtime`, and\n`plugins-integrations` chunks remain valid for manual one-shot reruns, but\nrelease checks use the split chunks.\n\nWhen live suites are enabled, the workflow shards broad native `pnpm test:live`\ncoverage through `scripts/test-live-shard.mjs` instead of one serial `live-all`\njob:\n\n- `native-live-src-agents`\n- `native-live-src-gateway-core`\n- `native-live-src-gateway-profiles` (release CI runs this with provider\n  filters such as `OPENCLAW_LIVE_GATEWAY_PROVIDERS=anthropic`)\n- `native-live-src-gateway-backends`\n- `native-live-test`\n- `native-live-extensions-a-k`\n- `native-live-extensions-l-n`\n- `native-live-extensions-openai`\n- `native-live-extensions-o-z`\n- `native-live-extensions-o-z-other`\n- `native-live-extensions-xai`\n- `native-live-extensions-media`\n- `native-live-extensions-media-audio`\n- `native-live-extensions-media-music`\n- `native-live-extensions-media-music-google`\n- `native-live-extensions-media-music-minimax`\n- `native-live-extensions-media-video`\n\nUse `node scripts/test-live-shard.mjs \u003cshard\u003e --list` to see the exact files\nbefore rerunning a failed native live shard. The aggregate `o-z` and `media`\nshards remain useful locally; release CI uses the smaller provider/media shards\nso one live-provider flake does not force a broad native live rerun.\n\nFor model-list or provider-selection fixes, use `live_models_only=true` plus the\nspecific `live_model_providers` allowlist. Confirm logs show the expected\n`OPENCLAW_LIVE_PROVIDERS` and selected model ids before declaring proof.\n\n## Docker\n\nDocker is expensive. First inspect the scheduler without running Docker:\n\n```bash\nOPENCLAW_DOCKER_ALL_DRY_RUN=1 pnpm test:docker:all\nOPENCLAW_DOCKER_ALL_DRY_RUN=1 OPENCLAW_DOCKER_ALL_LANES=install-e2e pnpm test:docker:all\nOPENCLAW_DOCKER_ALL_LANES=install-e2e node scripts/test-docker-all.mjs --plan-json\n```\n\nRun one failed lane locally only when explicitly asked or when GitHub is not\nusable:\n\n```bash\nOPENCLAW_DOCKER_ALL_LANES=\u003clane\u003e \\\nOPENCLAW_DOCKER_ALL_BUILD=0 \\\nOPENCLAW_DOCKER_ALL_PREFLIGHT=0 \\\nOPENCLAW_SKIP_DOCKER_BUILD=1 \\\nOPENCLAW_DOCKER_E2E_BARE_IMAGE='\u003cprepared-bare-image\u003e' \\\nOPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE='\u003cprepared-functional-image\u003e' \\\npnpm test:docker:all\n```\n\nFor release validation, prefer the reusable GitHub workflow input:\n\n```yaml\ndocker_lanes: install-e2e\n```\n\nMultiple lanes are allowed:\n\n```yaml\ndocker_lanes: install-e2e bundled-channel-update-acpx\n```\n\nThat skips the release chunk matrix and runs one targeted Docker job against the\nprepared GHCR images and the selected package artifact. Rerun commands\ngenerated inside GitHub artifacts include `package_artifact_run_id`,\n`package_artifact_name`, `docker_e2e_bare_image`, and\n`docker_e2e_functional_image` when available, so failed lanes can reuse the\nexact tarball and prepared images from the failed run. When the fix changes\npackage contents, omit those reuse inputs so the workflow packs a new tarball.\nLive-only targeted reruns skip the E2E images and build only the live-test\nimage. Release-path normal mode fans out into smaller Docker chunk jobs:\n\n- `core`\n- `package-update-openai`\n- `package-update-anthropic`\n- `package-update-core`\n- `plugins-runtime-plugins`\n- `plugins-runtime-services`\n- `plugins-runtime-install-a`\n- `plugins-runtime-install-b`\n- `plugins-runtime-install-c`\n- `plugins-runtime-install-d`\n- `bundled-channels`\n\nOpenWebUI is folded into `plugins-runtime-services` for full release-path\ncoverage and keeps a standalone `openwebui` chunk only for OpenWebUI-only\ndispatches. The legacy `package-update`, `plugins-runtime-core`,\n`plugins-runtime`, and `plugins-integrations` chunks still work as aggregate\naliases for manual reruns, but the release workflow uses the split chunks so\nprovider installer checks, plugin runtime checks, bundled plugin\ninstall/uninstall shards, and bundled-channel checks can run on separate\nmachines. The bundled-channel runtime-dependency coverage\ninside `bundled-channels`\nuses the split `bundled-channel-*` and `bundled-channel-update-*` lanes rather\nthan the serial `bundled-channel-deps` lane, so failures produce cheap targeted\nreruns for the exact channel/update scenario. The bundled plugin\ninstall/uninstall sweep is also split into\n`bundled-plugin-install-uninstall-0` through\n`bundled-plugin-install-uninstall-7`; selecting the legacy\n`bundled-plugin-install-uninstall` lane expands to all eight shards.\n\n## Package Acceptance\n\nUse the manual `Package Acceptance` workflow when the question is \"does this\ninstallable package work as a product?\" rather than \"does this source diff pass\nVitest?\"\n\nIn release validation, treat Package Acceptance as the package-candidate shard\ninside the larger release umbrella, not as a competing full-test path. Full\nRelease Validation and private release gauntlets should call Package Acceptance\nfor tarball resolution, Docker product/package proof, and optional Telegram QA\nagainst the same resolved `package-under-test` artifact; keep orchestration,\nsecret policy, blocking/advisory status, and evidence rollup in the caller.\n\nGood defaults:\n\n```bash\ngh workflow run package-acceptance.yml --ref main \\\n  -f source=npm \\\n  -f workflow_ref=main \\\n  -f package_spec=openclaw@beta \\\n  -f suite_profile=product \\\n  -f telegram_mode=mock-openai\n```\n\nNpm candidate selection:\n\n- Resolve the registry immediately before dispatch:\n  `npm view openclaw dist-tags --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$`\n  and `npm view openclaw@beta version dist.tarball dist.integrity --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$`.\n- If Peter asks for \"latest beta\", use `source=npm` with\n  `package_spec=openclaw@beta`, then record the resolved version from `npm view`\n  or the workflow summary.\n- For reruns, release proof, or comparing one known package, prefer the exact\n  immutable spec: `package_spec=openclaw@YYYY.M.D-beta.N` or\n  `package_spec=openclaw@YYYY.M.D`.\n- For stable package proof, use `package_spec=openclaw@latest` only when the\n  question is explicitly the current stable dist-tag; otherwise pin the exact\n  version.\n- `source=npm` only accepts registry specs for `openclaw@beta`,\n  `openclaw@latest`, or exact OpenClaw release versions. Do not pass semver\n  ranges, git refs, file paths, tarball URLs, or plugin package names there.\n- If the candidate is a tarball URL, use `source=url` with `package_sha256`. If\n  it is an Actions tarball artifact, use `source=artifact`. If it is an\n  unpublished source candidate, use `source=ref` with a trusted ref or SHA.\n- Package acceptance tests exactly the selected package candidate. Do not apply\n  `openclaw update --channel beta` fallback semantics here; if `beta` is absent,\n  stale, older than `latest`, or points at a broken tarball, report that tag\n  state instead of silently testing `latest`.\n\nProfiles:\n\n- `smoke`: quick confidence that the tarball installs, can onboard a channel,\n  can run an agent turn, and basic gateway/config lanes work.\n- `package`: release-package contract. Adds installer/update, doctor install\n  switching, bundled plugin runtime deps, plugin install/update, and package\n  repair lanes. This is the default native replacement for most Parallels\n  package/update coverage.\n- `product`: package profile plus broader product surfaces: MCP channels,\n  cron/subagent cleanup, OpenAI web search, and OpenWebUI.\n- `full`: split Docker release-path chunks with OpenWebUI.\n- `custom`: exact `docker_lanes` list for a focused rerun.\n\nCandidate sources:\n\n- `source=npm`: `openclaw@beta`, `openclaw@latest`, or an exact release version.\n- `source=ref`: pack `package_ref` using the trusted `workflow_ref` harness.\n  This intentionally separates old package commits from new workflow/test code.\n- `source=url`: HTTPS `.tgz` plus required `package_sha256`.\n- `source=artifact`: download one `.tgz` from `artifact_run_id`/`artifact_name`.\n\nRef model:\n\n- `gh workflow run ... --ref \u003cworkflow-ref\u003e` selects the workflow file revision\n  GitHub executes.\n- `workflow_ref` is the trusted harness/script ref passed to reusable Docker\n  E2E.\n- `package_ref` is the source ref to build when `source=ref`. It can be an\n  older branch/tag/SHA as long as it is reachable from an OpenClaw branch or\n  release tag.\n\nExample: run latest package acceptance harness against an older trusted commit:\n\n```bash\ngh workflow run package-acceptance.yml --ref main \\\n  -f workflow_ref=main \\\n  -f source=ref \\\n  -f package_ref=\u003cbranch-or-sha\u003e \\\n  -f suite_profile=package \\\n  -f telegram_mode=mock-openai\n```\n\nUse `telegram_mode=mock-openai` or `telegram_mode=live-frontier` when the same\nresolved `package-under-test` tarball should also run through the Telegram QA\nworkflow in the `qa-live-shared` environment. The standalone Telegram workflow\nstill accepts a published npm spec for post-publish checks, but Package\nAcceptance passes the resolved artifact for `source=npm`, `ref`, `url`, and\n`artifact`. Use `telegram_mode=none` only when intentionally skipping Telegram\ncredentialed package proof for a focused rerun.\n\nDocker E2E images never copy repo sources as the app under test: the bare image\nis a Node/Git runner, and the functional image installs the same prebuilt npm\ntarball that bare lanes mount. `scripts/package-openclaw-for-docker.mjs` is the\nsingle packer for local scripts and CI and validates the tarball inventory\nbefore Docker consumes it. `scripts/test-docker-all.mjs --plan-json` is the\nscheduler-owned CI plan for image kind, package, live image, lane, and\ncredential needs. Docker lane definitions live in the single scenario catalog\n`scripts/lib/docker-e2e-scenarios.mjs`; planner logic lives in\n`scripts/lib/docker-e2e-plan.mjs`. `scripts/docker-e2e.mjs` converts plan and\nsummary JSON into GitHub outputs and step summaries. Every scheduler run writes\n`.artifacts/docker-tests/**/summary.json` plus `failures.json`. Read those\nbefore rerunning. Lane entries include `command`, `rerunCommand`, status,\ntiming, timeout state, image kind, and log file path. The summary also includes\ntop-level phase timings for preflight, image build, package prep, lane pools,\nand cleanup. Use `pnpm test:docker:timings \u003csummary.json\u003e` to rank slow lanes\nand phases before deciding whether a broader rerun is justified.\n\nSkill install proof: use `pnpm test:docker:skill-install` or targeted\n`docker_lanes=skill-install` for live ClawHub skill-install validation. The\nlane installs the package tarball in a bare runner, keeps\n`skills.install.allowUploadedArchives=false`, resolves the current live slug\nfrom `openclaw skills search`, installs it, and verifies `.clawhub` origin/lock\nmetadata. Prefer this checked-in script over inline heredoc Testbox recipes.\n\n## Cheap Docker Reruns\n\nFirst derive the smallest rerun command from artifacts:\n\n```bash\npnpm test:docker:rerun \u003cgithub-run-id\u003e\npnpm test:docker:rerun .artifacts/docker-tests/\u003crun\u003e/failures.json\n```\n\nThe script downloads Docker E2E artifacts for a GitHub run, reads\n`summary.json`/`failures.json`, and prints a combined targeted workflow command\nplus per-lane commands. Prefer the combined targeted command when several lanes\nfailed for the same patch:\n\n```bash\ngh workflow run openclaw-live-and-e2e-checks-reusable.yml \\\n  -f ref=\u003csha\u003e \\\n  -f include_repo_e2e=false \\\n  -f include_release_path_suites=false \\\n  -f include_openwebui=false \\\n  -f docker_lanes='install-e2e bundled-channel-update-acpx' \\\n  -f include_live_suites=false \\\n  -f live_models_only=false\n```\n\nThat path still runs the prepare job, so it creates a new tarball for `\u003csha\u003e`.\nIf the SHA-tagged GHCR bare/functional image already exists, CI skips rebuilding\nthat image and only uploads the fresh package artifact before the targeted lane\njob. Do not rerun the full release path unless the failed lane list\nor touched surface really requires it.\n\n## Docker Expected Timings\n\nTreat these as ballpark. Blacksmith queue time, GHCR pull speed, provider\nlatency, npm cache state, and Docker daemon health can dominate.\n\nCurrent local timing artifact (`.artifacts/docker-tests/lane-timings.json`) has\nthese rough bands:\n\n- Tiny lanes, seconds to under 1 minute:\n  `agents-delete-shared-workspace` ~3s, `plugin-update` ~7s,\n  `config-reload` ~14s, `pi-bundle-mcp-tools` ~15s, `onboard` ~18s,\n  `session-runtime-context` ~20s, `gateway-network` ~34s, `qr` ~44s.\n- Medium deterministic lanes, ~1-5 minutes:\n  `npm-onboard-channel-agent` ~96s, `openai-image-auth` ~99s,\n  bundled channel/update lanes usually ~90-300s when split, `openwebui` ~225s,\n  `mcp-channels` ~274s.\n- Heavy deterministic lanes, ~6-10 minutes:\n  `bundled-channel-root-owned` ~429s,\n  `bundled-channel-setup-entry` ~420s,\n  `bundled-channel-load-failure` ~383s,\n  `cron-mcp-cleanup` ~567s.\n- Live provider lanes, often ~15-20 minutes:\n  `live-gateway` ~958s, `live-models` ~1054s.\n- Installer/release lanes:\n  `install-e2e` and package-update paths can vary widely with npm, provider,\n  and package registry behavior. Budget tens of minutes; prefer GitHub targeted\n  reruns over local repeats.\n\nDefault fallback lane timeout is 120 minutes. A timeout usually means debug the\nlane log/artifacts first, not “run the whole thing again.”\n\n## Failure Workflow\n\n1. Identify exact failing job, SHA, lane, and artifact path.\n2. Read `failures.json`, `summary.json`, and the failed lane log tail.\n3. Use `pnpm test:docker:rerun \u003crun-id|failures.json\u003e` to generate targeted\n   GitHub rerun commands.\n4. If the lane has `rerunCommand`, use that only as a local starting point.\n5. For Docker release failures, dispatch targeted `docker_lanes=\u003cfailed-lane\u003e`\n   on GitHub before considering local Docker.\n6. Patch narrowly, then rerun the failed file/lane only.\n7. Broaden to `pnpm check:changed` or CI only after the isolated proof passes.\n\n## When To Escalate\n\n- Public SDK/plugin contract changes: run changed gate plus relevant extension\n  validation.\n- Build output, lazy imports, package boundaries, or published surfaces:\n  include `pnpm build`.\n- Workflow edits: run `pnpm check:workflows`.\n- Release branch or tag validation: use release docs and GitHub workflows; avoid\n  local Docker unless Peter explicitly asks.\n","agents/openai.yaml":"interface:\n  display_name: \"OpenClaw Testing\"\n  short_description: \"Choose cheap, targeted OpenClaw validation\"\n  default_prompt: \"Use $openclaw-testing to choose the cheapest safe test or CI verification path, inspect failures, and rerun only the relevant OpenClaw lane.\"\n"},"import":{"commit_sha":"424c6d0a5f4665b803ad6768d08b0be7659deaf4","imported_at":"2026-05-18T20:13:36Z","license_text":"MIT License\n\nCopyright (c) 2025 Peter Steinberger\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n","owner":"openclaw","repo":"openclaw/openclaw","source_url":"https://github.com/openclaw/openclaw/tree/424c6d0a5f4665b803ad6768d08b0be7659deaf4/.agents/skills/openclaw-testing"}},"content_hash":[11,244,160,10,232,242,100,102,53,201,159,202,117,61,170,11,223,147,9,245,179,173,7,66,91,19,43,242,29,147,97,92],"trust_level":"unsigned","yanked":false}