{"kind":"Skill","metadata":{"namespace":"community","name":"openclaw-test-performance","version":"0.1.0"},"spec":{"description":"Benchmark, diagnose, and optimize OpenClaw test and plugin-suite runtime, import hotspots, CPU/RSS, heap growth, and slow coverage paths.","files":{"SKILL.md":"---\nname: openclaw-test-performance\ndescription: Benchmark, diagnose, and optimize OpenClaw test and plugin-suite runtime, import hotspots, CPU/RSS, heap growth, and slow coverage paths.\n---\n\n# OpenClaw Test Performance\n\nUse evidence first. The goal is real `pnpm test`, plugin-suite, and\nplugin-inspector speed/RSS improvement with coverage intact, not runner tuning by\nguesswork.\n\n## Workflow\n\n1. Read the relevant local `AGENTS.md` files before editing:\n   - `src/agents/AGENTS.md` for agent/import hotspots.\n   - `src/channels/AGENTS.md` and `src/plugins/AGENTS.md` for plugin/channel\n     laziness.\n   - `src/gateway/AGENTS.md` for server lifecycle tests.\n   - `test/helpers/AGENTS.md` and `test/helpers/channels/AGENTS.md` for shared\n     contract helpers.\n   - `src/infra/outbound/AGENTS.md` for outbound/media/action tests.\n2. Establish a baseline before changing code:\n   - Prefer `pnpm test:perf:groups --full-suite --allow-failures --output \u003cfile\u003e`\n     for full-suite ranking.\n   - For bundled plugin breadth, run the smallest relevant `pnpm\ntest:extensions:batch \u003cplugin[,plugin...]\u003e` or plugin-inspector command\n     before jumping to the full extension sweep.\n   - For a scoped hotspot use:\n     `/usr/bin/time -l pnpm test \u003cfile-or-files\u003e --maxWorkers=1 --reporter=verbose`\n   - For import-heavy suspicion add:\n     `OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1`.\n3. Separate wall/runner noise from real file cost:\n   - Compare Vitest duration, test body timing, import breakdown, wall time, and\n     max RSS.\n   - Re-run single files when grouped/full-suite numbers look stale or noisy.\n   - If a full-suite grouped run reports a lane failure but JSON says tests\n     passed, capture that as harness/noise and verify the suspect file directly.\n4. Pick the next attack by return and risk:\n   - High return: one file/test dominates seconds or RSS and has a clear root.\n   - High leverage: one plugin or SDK barrel causes every plugin-inspector or\n     extension-batch run to load broad runtime.\n   - Lower risk: static descriptors, target parsing, routing, auth bypass,\n     setup hints, registry fixtures, or test server lifecycle.\n   - Higher risk: real memory/runtime behavior, live providers, protocol\n     contracts, or broad production refactors.\n5. Fix the root cause, not the symptom:\n   - Move static metadata/parsing into narrow helpers or lightweight artifacts\n     reused by full runtime and fast paths.\n   - Prefer dependency injection, loaded-plugin-only lookup, explicit fixtures,\n     and pure helpers over broad mocks.\n   - Reuse suite-level servers/clients when a fresh handshake is irrelevant.\n   - Keep schedulers/background loops off unless the test proves scheduling.\n   - In plugin paths, move static metadata into manifest/lightweight artifacts\n     and keep runtime plugin loads behind explicit execution boundaries.\n6. Preserve coverage shape:\n   - Do not delete a slow integration proof unless the exact production\n     composition is extracted into a named helper and tested.\n   - Keep one cheap integration smoke when cross-component wiring matters.\n   - State explicitly what incidental coverage was removed, if any.\n7. Re-benchmark the same command after the change and compute seconds plus\n   percent gain.\n8. Update the running report when requested or when this thread is tracking one.\n   Include before/after commands, artifacts, coverage notes, verification, and\n   next attack order.\n9. Commit with `scripts/committer \"\u003cmessage\u003e\" \u003cpaths...\u003e` and push when the\n   user asked for commits/pushes. Stage only files touched for this attack.\n\n## Plugin-Suite Workflow\n\nUse this section when perf work involves bundled plugins, plugin-inspector, SDK\nbarrels, package-boundary tests, or extension suites.\n\n1. Map the suite shape first:\n   - source tests: `pnpm test extensions/\u003cid\u003e` or `pnpm test:extensions:batch \u003cid\u003e`\n   - package boundaries: `pnpm run test:extensions:package-boundary:canary` and\n     `pnpm run test:extensions:package-boundary:compile`\n   - all bundled source tests: `pnpm test:extensions`\n   - plugin import memory: `pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json`\n   - plugin-inspector/report work: keep report primitives in `plugin-inspector`;\n     keep wrappers thin and collect peak RSS when the command supports it.\n2. Start narrow, then widen:\n   - one plugin changed: run that plugin's tests and plugin-inspector slice.\n   - SDK/public barrel changed: add representative provider, channel, memory,\n     and feature plugins.\n   - loader/runtime mirror changed: add package-boundary checks and build/package\n     proof as needed.\n   - unknown shared plugin behavior: run `test:extensions:batch` groups before\n     `pnpm test:extensions`.\n3. Treat plugin-inspector failures as product signals:\n   - JSON must parse.\n   - warnings/errors must be classified, not hidden.\n   - runtime capture should be quiet and config-tolerant.\n   - command output should include wall time, exit code, and peak RSS when\n     available.\n4. For broad or package-heavy plugin proof, use Crabbox-backed Blacksmith\n   Testbox by default on maintainer machines:\n   - `pnpm crabbox:run -- --provider blacksmith-testbox --timing-json -- OPENCLAW_TESTBOX=1 pnpm test:extensions:batch \u003cids\u003e`\n   - add `--keep`/`--id \u003cid-or-slug\u003e` only when several commands must share one\n     warmed box; stop it with `pnpm crabbox:stop -- \u003cid-or-slug\u003e`.\n5. If plugin performance is package-artifact sensitive, switch to\n   `openclaw-pre-release-plugin-testing` and Package Acceptance rather than\n   trusting source-only timing.\n\n## Metric Collection\n\nCollect at least one stable metric before and after. Prefer the same machine and\nsame command. For Testbox comparisons, use the same `tbx_...` id when possible.\n\n| Metric          | Use for                            | Preferred source                                                            |\n| --------------- | ---------------------------------- | --------------------------------------------------------------------------- |\n| wall time       | user-visible suite cost            | `/usr/bin/time -l`, test wrapper duration, Testbox run time                 |\n| Vitest duration | test body/import cost              | Vitest output per file/shard                                                |\n| import duration | broad barrel/runtime loads         | `OPENCLAW_VITEST_IMPORT_DURATIONS=1`                                        |\n| max RSS         | memory pressure and OOM risk       | `/usr/bin/time -l`, `pnpm test:extensions:memory`, wrapper memory summaries |\n| CPU/user/sys    | CPU-bound vs wait-bound split      | `/usr/bin/time -l` locally, Testbox job timing when local CPU is noisy      |\n| heap snapshots  | real leak vs retained module graph | `openclaw-test-heap-leaks` workflow                                         |\n\nLocal scoped command with CPU/RSS:\n\n```bash\ntimeout 240 /usr/bin/time -l pnpm test \u003cfile\u003e --maxWorkers=1 --reporter=verbose\n```\n\nPlugin import memory profile:\n\n```bash\npnpm build\npnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.json\n```\n\nTargeted plugin import memory:\n\n```bash\npnpm test:extensions:memory -- --extension discord --extension telegram --skip-combined\n```\n\nHeap/RSS escalation:\n\n```bash\nOPENCLAW_TEST_MEMORY_TRACE=1 \\\nOPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \\\nOPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \\\nOPENCLAW_TEST_WORKERS=2 \\\nOPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \\\npnpm test\n```\n\nUse `openclaw-test-heap-leaks` when RSS keeps growing across intervals, workers\nOOM, or the suspect command has app-object retention. Do not call RSS growth a\nleak until snapshots or retainers support it.\n\n## Common Root Causes\n\n- Full bundled channel/plugin runtime loaded for static data.\n- `getChannelPlugin()` fallback used when an already-loaded fixture or pure\n  parser would suffice.\n- Broad `api.ts`, `runtime-api.ts`, `test-api.ts`, or plugin-sdk barrels pulled\n  into hot tests.\n- SDK root aliases or package barrels pulling focused subpaths back into a broad\n  plugin graph.\n- Plugin-inspector loading runtime code just to render metadata, reports, or CI\n  policy scores.\n- Bundled plugin capture reusing real config/home state instead of synthetic,\n  redacted, isolated state.\n- Partial-real mocks using `importActual()` around broad modules.\n- `vi.resetModules()` plus fresh imports in per-test loops.\n- Test plugin registry seeded in `beforeAll` while runtime state resets in\n  `afterEach`.\n- Per-test gateway/server/client startup when state reset would suffice.\n- Runtime/default model/auth selection paid by idle snapshots or fixtures.\n- Plugin-owned media/action discovery triggered before checking whether args\n  contain plugin-owned fields.\n- Timings missing from `test/fixtures/test-timings.unit.json`, causing hotspot\n  files to stay in shared workers.\n- Parallel Vitest runs sharing `node_modules/.experimental-vitest-cache` without\n  distinct `OPENCLAW_VITEST_FS_MODULE_CACHE_PATH` values.\n\n## Benchmark Commands\n\nScoped file:\n\n```bash\ntimeout 240 /usr/bin/time -l pnpm test \u003cfile\u003e --maxWorkers=1 --reporter=verbose\n```\n\nScoped file with import breakdown:\n\n```bash\ntimeout 240 /usr/bin/time -l env \\\n  OPENCLAW_VITEST_IMPORT_DURATIONS=1 \\\n  OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \\\n  pnpm test \u003cfile\u003e --maxWorkers=1 --reporter=verbose\n```\n\nGrouped suite:\n\n```bash\npnpm test:perf:groups --full-suite --allow-failures \\\n  --output .artifacts/test-perf/\u003cname\u003e.json\n```\n\nExtension batch:\n\n```bash\npnpm test:extensions:batch \u003cplugin[,plugin...]\u003e -- --reporter=verbose\n```\n\nAll extension tests:\n\n```bash\npnpm test:extensions\n```\n\nPackage-boundary plugin checks:\n\n```bash\npnpm run test:extensions:package-boundary:canary\npnpm run test:extensions:package-boundary:compile\n```\n\nReuse an existing Vitest JSON report:\n\n```bash\npnpm test:perf:groups --report \u003cvitest-json\u003e \\\n  --output .artifacts/test-perf/\u003cname\u003e.json\n```\n\n## Verification\n\n- Always run the targeted test surface that proves the change.\n- For source changes, run `pnpm check:changed` before push; in maintainer\n  Testbox mode run it in the warmed Testbox.\n- For test-only changes, run `pnpm test:changed` or the exact edited tests.\n- Run `pnpm build` when touching lazy-loading, bundled artifacts, package\n  boundaries, dynamic imports, build output, or public surfaces.\n- For plugin SDK/barrel/runtime changes, add `pnpm plugin-sdk:api:check` or\n  `pnpm plugin-sdk:api:gen` when the API surface may drift.\n- For plugin-suite perf fixes, verify at least one representative plugin batch\n  plus the changed gate; use Package Acceptance if the bug only exists in a\n  packed artifact.\n- If deps are missing/stale, run `pnpm install` and retry the exact failed\n  command once.\n- Use the report format:\n\n```markdown\n| Metric         | Before |  After |          Gain |\n| -------------- | -----: | -----: | ------------: |\n| File wall time |   `Xs` |   `Ys` |  `-Zs` (`P%`) |\n| Max RSS        |  `XMB` |  `YMB` | `-ZMB` (`P%`) |\n| CPU user/sys   | `X/Ys` | `A/Bs` |       explain |\n```\n\n## Handoff\n\nKeep the final concise:\n\n- Root cause.\n- Suite/plugin scope.\n- Files changed.\n- Before/after wall, Vitest/import, CPU, and RSS numbers where available.\n- Leak classification if memory was involved: real leak, retained module graph,\n  or inconclusive.\n- Coverage retained.\n- Verification commands.\n- Testbox ID or workflow URL for remote proof.\n- Commit hash and push status.\n","agents/openai.yaml":"interface:\n  display_name: \"OpenClaw Test Performance\"\n  short_description: \"Benchmark tests, plugin suites, CPU, RSS, and heap growth\"\n  default_prompt: \"Use $openclaw-test-performance to reassess OpenClaw test and plugin-suite performance, collect wall/import/CPU/RSS metrics, investigate memory growth when needed, fix the next real hotspot without losing coverage, update the report, and commit scoped changes.\"\npolicy:\n  allow_implicit_invocation: false\n"},"import":{"commit_sha":"424c6d0a5f4665b803ad6768d08b0be7659deaf4","imported_at":"2026-05-18T20:13:36Z","license_text":"MIT License\n\nCopyright (c) 2025 Peter Steinberger\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n","owner":"openclaw","repo":"openclaw/openclaw","source_url":"https://github.com/openclaw/openclaw/tree/424c6d0a5f4665b803ad6768d08b0be7659deaf4/.agents/skills/openclaw-test-performance"}},"content_hash":[144,233,154,32,229,1,22,150,178,14,74,205,224,247,145,29,180,71,227,110,197,38,229,234,42,155,235,33,161,111,64,204],"trust_level":"unsigned","yanked":false}
