{"kind":"Skill","metadata":{"namespace":"community","name":"openclaw-qa-testing","version":"0.1.0"},"spec":{"description":"Run, watch, debug, extend, or explain OpenClaw qa-lab and qa-channel scenarios, artifacts, and live lanes.","files":{"SKILL.md":"---\nname: openclaw-qa-testing\ndescription: Run, watch, debug, extend, or explain OpenClaw qa-lab and qa-channel scenarios, artifacts, and live lanes.\n---\n\n# OpenClaw QA Testing\n\nUse this skill for `qa-lab` / `qa-channel` work. Repo-local QA only.\n\n## Read first\n\n- `docs/concepts/qa-e2e-automation.md`\n- `docs/help/testing.md`\n- `docs/channels/qa-channel.md`\n- `qa/README.md`\n- `qa/scenarios/index.md`\n- `extensions/qa-lab/src/suite.ts`\n- `extensions/qa-lab/src/character-eval.ts`\n\n## Model policy\n\n- Live OpenAI lane: `openai/gpt-5.4`\n- Fast mode: on\n- Do not use:\n  - `openai/gpt-5.4-pro`\n  - `openai/gpt-5.4-mini`\n- Only change model policy if the user explicitly asks.\n\n## Default workflow\n\n1. Read the scenario pack and current suite implementation.\n2. Decide lane:\n   - mock/dev: `mock-openai`\n   - real validation: `live-frontier`\n3. For live OpenAI, use:\n\n```bash\nOPENCLAW_LIVE_OPENAI_KEY=\"${OPENAI_API_KEY}\" \\\npnpm openclaw qa suite \\\n  --provider-mode live-frontier \\\n  --model openai/gpt-5.4 \\\n  --alt-model openai/gpt-5.4 \\\n  --output-dir .artifacts/qa-e2e/run-all-live-frontier-\u003ctag\u003e\n```\n\n4. Watch outputs:\n   - summary: `.artifacts/qa-e2e/run-all-live-frontier-\u003ctag\u003e/qa-suite-summary.json`\n   - report: `.artifacts/qa-e2e/run-all-live-frontier-\u003ctag\u003e/qa-suite-report.md`\n5. If the user wants to watch the live UI, find the current `openclaw-qa` listen port and report `http://127.0.0.1:\u003cport\u003e`.\n6. If a scenario fails, fix the product or harness root cause, then rerun the full lane.\n\n## OTEL smoke\n\nFor local QA-lab OpenTelemetry validation, use:\n\n```bash\npnpm qa:otel:smoke\n```\n\nThis starts a local OTLP/HTTP trace receiver, runs the `otel-trace-smoke`\nscenario through qa-channel, decodes the emitted protobuf spans, and verifies\nthe exported trace names and privacy contract. It does not require Opik,\nLangfuse, or external collector credentials.\n\n## Matrix live profiles\n\n`pnpm openclaw qa matrix` defaults to the full `all` profile. Use explicit\nprofiles for faster CI/release proof:\n\n```bash\nOPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000 \\\npnpm openclaw qa matrix --profile fast --fail-fast\n```\n\n- `fast`: release-critical transport contract, excluding generated image and\n  deep E2EE recovery inventory.\n- `transport`, `media`, `e2ee-smoke`, `e2ee-deep`, `e2ee-cli`: sharded full\n  Matrix coverage.\n- `QA-Lab - All Lanes` uses explicit `fast` Matrix on scheduled runs. Manual\n  dispatch keeps `matrix_profile=all` as the default and always shards that full\n  Matrix selection.\n\n## QA credentials and 1Password\n\n- Use `op` only inside `tmux` for QA secret lookup in this repo.\n- Quick auth check inside tmux:\n\n```bash\nop account list\n```\n\n- Direct Telegram npm live test secrets currently live in 1Password item:\n  - vault: `OpenClaw`\n  - item: `Telegram E2E`\n- That item is the first place to look for:\n  - `OPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN`\n  - `OPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN`\n  - `OPENCLAW_QA_PROVIDER_MODE`\n  - `OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC`\n- Convex QA secrets currently live in 1Password items:\n  - vault: `OpenClaw`\n  - item: `OPENCLAW_QA_CONVEX_SITE_URL`\n  - item: `OPENCLAW_QA_CONVEX_SECRET_MAINTAINER`\n  - item: `OPENCLAW_QA_CONVEX_SECRET_CI`\n- Additional related notes/login items seen during QA credential work:\n  - vault: `Private`\n  - items: `OPENCLAW QA`, `Convex`, `Telegram`\n- If a required value is missing from those notes:\n  - do not guess\n  - ask the maintainer/operator for the current value or the current 1Password item name\n  - for Telegram direct runs, `OPENCLAW_QA_TELEGRAM_GROUP_ID` may be stored separately from `Telegram E2E`\n  - for Convex runs, the leased Telegram credential should provide the Telegram group id and bot tokens together; do not require a separate `OPENCLAW_QA_TELEGRAM_GROUP_ID`\n  - for Convex runs, prefer `OpenClaw/OPENCLAW_QA_CONVEX_SITE_URL`; if that is stale or unclear, ask for the active pool URL before running\n- Prefer direct Telegram envs for the npm Telegram Docker lane when available:\n\n```bash\nOPENCLAW_QA_TELEGRAM_GROUP_ID=\"...\" \\\nOPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN=\"...\" \\\nOPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN=\"...\" \\\nOPENCLAW_QA_PROVIDER_MODE=\"mock-openai\" \\\nOPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC=\"openclaw@beta\" \\\npnpm test:docker:npm-telegram-live\n```\n\n- Prefer Convex mode when the goal is stable shared QA infra:\n  - round-robin credential leasing\n  - thinner wrapper for channel-specific setup\n  - CLI/admin flows around the pooled credentials\n- Live npm Telegram Docker lane note:\n  - `scripts/e2e/npm-telegram-live-runner.ts` reads `OPENCLAW_NPM_TELEGRAM_PROVIDER_MODE`\n  - do not assume `OPENCLAW_QA_PROVIDER_MODE` is consumed by that wrapper\n  - if a 1Password note only gives `OPENCLAW_QA_PROVIDER_MODE`, map it explicitly to `OPENCLAW_NPM_TELEGRAM_PROVIDER_MODE` before running the Docker lane\n- Verified live shape:\n  - Convex mode can pass the real Docker lane without direct Telegram env vars\n  - leased Telegram payload includes the group id coupled to the driver/SUT tokens\n  - a real run of `pnpm test:docker:npm-telegram-live` passed with:\n    - `OPENCLAW_QA_CREDENTIAL_SOURCE=convex`\n    - `OPENCLAW_QA_CREDENTIAL_ROLE=maintainer`\n    - `OPENCLAW_QA_CONVEX_SITE_URL`\n    - `OPENCLAW_QA_CONVEX_SECRET_MAINTAINER`\n    - `OPENCLAW_NPM_TELEGRAM_PROVIDER_MODE=mock-openai`\n- If direct Telegram env is missing locally and `op signin` blocks, prefer dispatching the manual GitHub lane because the `qa-live-shared` environment already has Convex CI credentials:\n\n```bash\ngh workflow run \"NPM Telegram Beta E2E\" --repo openclaw/openclaw --ref main \\\n  -f package_spec=openclaw@YYYY.M.D-beta.N \\\n  -f package_label=openclaw@YYYY.M.D-beta.N \\\n  -f provider_mode=mock-openai\n```\n\n- Poll the exact run id from the dispatch URL. `gh run view --json artifacts` is not supported; list artifacts with:\n\n```bash\ngh api repos/openclaw/openclaw/actions/runs/\u003crun-id\u003e/artifacts\n```\n\n## WhatsApp live credentials\n\nUse this when setting up or replacing Convex `kind=whatsapp` credentials.\n\n- Treat WhatsApp QA credentials as operator-owned live accounts, not generated fixtures.\n- Use two dedicated WhatsApp-capable test numbers: one driver account and one SUT account. Do not use personal numbers or personal OpenClaw WhatsApp accounts in the shared pool.\n- Register and link each account manually with WhatsApp or WhatsApp Business, storing Web auth only in isolated local auth dirs outside the repo.\n- For group coverage, create a dedicated test group that includes both QA accounts and store its JID as `groupJid`; otherwise the group mention-gating scenario should be skipped by default and fail when explicitly requested.\n- Package the two Baileys auth dirs into base64 `.tgz` payload fields and add a new active Convex credential row. Prefer adding a fresh row and disabling stale/broken rows over overwriting credentials in place.\n- Expected payload fields: `driverPhoneE164`, `sutPhoneE164`, `driverAuthArchiveBase64`, `sutAuthArchiveBase64`, and optional `groupJid`.\n- Keep credential material out of the repo, logs, PRs, and screenshots. Redact phone numbers unless the operator explicitly asks for local debugging.\n- Validate with `pnpm openclaw qa whatsapp --credential-source convex --credential-role maintainer --provider-mode mock-openai` and preserve artifact paths plus redacted pass/fail summaries.\n- If WhatsApp expires or invalidates a linked Web session, relink locally, package fresh auth archives, add a new Convex row, then disable the stale row.\n\n## Character evals\n\nUse `qa character-eval` for style/persona/vibe checks across multiple live models.\n\n```bash\npnpm openclaw qa character-eval \\\n  --model openai/gpt-5.4,thinking=xhigh \\\n  --model openai/gpt-5.2,thinking=xhigh \\\n  --model openai/gpt-5,thinking=xhigh \\\n  --model anthropic/claude-opus-4-6,thinking=high \\\n  --model anthropic/claude-sonnet-4-6,thinking=high \\\n  --model zai/glm-5.1,thinking=high \\\n  --model moonshot/kimi-k2.5,thinking=high \\\n  --model google/gemini-3.1-pro-preview,thinking=high \\\n  --judge-model openai/gpt-5.4,thinking=xhigh,fast \\\n  --judge-model anthropic/claude-opus-4-6,thinking=high \\\n  --concurrency 16 \\\n  --judge-concurrency 16 \\\n  --output-dir .artifacts/qa-e2e/character-eval-\u003ctag\u003e\n```\n\n- Runs local QA gateway child processes, not Docker.\n- Preferred model spec syntax is `provider/model,thinking=\u003clevel\u003e[,fast|,no-fast|,fast=\u003cbool\u003e]` for both `--model` and `--judge-model`.\n- Do not add new examples with separate `--model-thinking`; keep that flag as legacy compatibility only.\n- Defaults to candidate models `openai/gpt-5.4`, `openai/gpt-5.2`, `openai/gpt-5`, `anthropic/claude-opus-4-6`, `anthropic/claude-sonnet-4-6`, `zai/glm-5.1`, `moonshot/kimi-k2.5`, and `google/gemini-3.1-pro-preview` when no `--model` is passed.\n- Candidate thinking defaults to `high`, with `xhigh` for OpenAI models that support it. Prefer inline `--model provider/model,thinking=\u003clevel\u003e`; `--thinking \u003clevel\u003e` and `--model-thinking \u003cprovider/model=level\u003e` remain compatibility shims.\n- OpenAI candidate refs default to fast mode so priority processing is used where supported. Use inline `,fast`, `,no-fast`, or `,fast=false` for one model; use `--fast` only to force fast mode for every candidate.\n- Judges default to `openai/gpt-5.4,thinking=xhigh,fast` and `anthropic/claude-opus-4-6,thinking=high`.\n- Report includes judge ranking, run stats, durations, and full transcripts; do not include raw judge replies. Duration is benchmark context, not a grading signal.\n- Candidate and judge concurrency default to 16. Use `--concurrency \u003cn\u003e` and `--judge-concurrency \u003cn\u003e` to override when local gateways or provider limits need a gentler lane.\n- Scenario source should stay markdown-driven under `qa/scenarios/`.\n- For isolated character/persona evals, write the persona into `SOUL.md` and blank `IDENTITY.md` in the scenario flow. Use `SOUL.md + IDENTITY.md` only when intentionally testing how the normal OpenClaw identity combines with the character.\n- Keep prompts natural and task-shaped. The candidate model should receive character setup through `SOUL.md`, then normal user turns such as chat, workspace help, and small file tasks; do not ask \"how would you react?\" or tell the model it is in an eval.\n- Prefer at least one real task, such as creating or editing a tiny workspace artifact, so the transcript captures character under normal tool use instead of pure roleplay.\n\n## Codex CLI model lane\n\nUse model refs shaped like `codex-cli/\u003ccodex-model\u003e` whenever QA should exercise Codex as a model backend.\n\nExamples:\n\n```bash\npnpm openclaw qa suite \\\n  --provider-mode live-frontier \\\n  --model codex-cli/\u003ccodex-model\u003e \\\n  --alt-model codex-cli/\u003ccodex-model\u003e \\\n  --scenario \u003cscenario-id\u003e \\\n  --output-dir .artifacts/qa-e2e/codex-\u003ctag\u003e\n```\n\n```bash\npnpm openclaw qa manual \\\n  --model codex-cli/\u003ccodex-model\u003e \\\n  --message \"Reply exactly: CODEX_OK\"\n```\n\n- Treat the concrete Codex model name as user/config input; do not hardcode it in source, docs examples, or scenarios.\n- Live QA preserves `CODEX_HOME` so Codex CLI auth/config works while keeping `HOME` and `OPENCLAW_HOME` sandboxed.\n- Mock QA should scrub `CODEX_HOME`.\n- If Codex returns fallback/auth text every turn, first check `CODEX_HOME`,\n  relevant secret-backed auth, and gateway child logs before changing\n  scenario assertions.\n- For model comparison, include `codex-cli/\u003ccodex-model\u003e` as another candidate in `qa character-eval`; the report should label it as an opaque model name.\n\n## Repo facts\n\n- Seed scenarios live in `qa/`.\n- Main live runner: `extensions/qa-lab/src/suite.ts`\n- QA lab server: `extensions/qa-lab/src/lab-server.ts`\n- Child gateway harness: `extensions/qa-lab/src/gateway-child.ts`\n- Synthetic channel: `extensions/qa-channel/`\n\n## What “done” looks like\n\n- Full suite green for the requested lane.\n- User gets:\n  - watch URL if applicable\n  - pass/fail counts\n  - artifact paths\n  - concise note on what was fixed\n\n## Common failure patterns\n\n- Live timeout too short:\n  - widen live waits in `extensions/qa-lab/src/suite.ts`\n- Discovery cannot find repo files:\n  - point prompts at `repo/...` inside seeded workspace\n- Subagent proof too brittle:\n  - prefer stable final reply evidence over transient child-session listing\n- Harness “rebuild” delay:\n  - dirty tree can trigger a pre-run build; expect that before ports appear\n\n## When adding scenarios\n\n- Add or update scenario markdown under `qa/scenarios/`\n- Keep kickoff expectations in `qa/scenarios/index.md` aligned\n- Add executable coverage in `extensions/qa-lab/src/suite.ts`\n- Prefer end-to-end assertions over mock-only checks\n- Save outputs under `.artifacts/qa-e2e/`\n","agents/openai.yaml":"interface:\n  display_name: \"QA Test OpenClaw\"\n  short_description: \"Run and debug qa-lab and qa-channel scenarios\"\n  default_prompt: \"Use $openclaw-qa-testing to run or extend the OpenClaw QA suite with qa-lab and qa-channel, using regular openai/gpt-5.4 in fast mode for live OpenAI runs.\"\n"},"import":{"commit_sha":"424c6d0a5f4665b803ad6768d08b0be7659deaf4","imported_at":"2026-05-18T20:13:36Z","license_text":"MIT License\n\nCopyright (c) 2025 Peter Steinberger\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n","owner":"openclaw","repo":"openclaw/openclaw","source_url":"https://github.com/openclaw/openclaw/tree/424c6d0a5f4665b803ad6768d08b0be7659deaf4/.agents/skills/openclaw-qa-testing"}},"content_hash":[239,218,20,77,138,221,5,46,239,164,0,174,139,54,41,255,210,176,192,97,113,248,107,7,118,167,245,181,218,51,64,235],"trust_level":"unsigned","yanked":false}
