{"kind":"Skill","metadata":{"namespace":"community","name":"phoenix-cli","version":"0.1.0"},"spec":{"description":"Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking \"what's going wrong\", \"what kinds of mistakes\", or \"where do I focus\" — even without naming a technique.","files":{"SKILL.md":"---\nname: phoenix-cli\ndescription: Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking \"what's going wrong\", \"what kinds of mistakes\", or \"where do I focus\" — even without naming a technique.\nlicense: Apache-2.0\ncompatibility: Requires Node.js (for npx) or global install of @arizeai/phoenix-cli. Optionally requires jq for JSON processing.\nmetadata:\n  author: arize-ai\n  version: \"3.3.0\"\n---\n\n# Phoenix CLI\n\n## Invocation\n\n```bash\npx \u003cresource\u003e \u003caction\u003e                          # if installed globally\nnpx @arizeai/phoenix-cli \u003cresource\u003e \u003caction\u003e    # no install required\n```\n\nThe CLI uses singular resource commands with subcommands like `list` and `get`:\n\n```bash\npx trace list\npx trace get \u003ctrace-id\u003e\npx trace annotate \u003ctrace-id\u003e\npx trace add-note \u003ctrace-id\u003e\npx trace-annotations delete\npx span list\npx span annotate \u003cspan-id\u003e\npx span add-note \u003cspan-id\u003e\npx span-annotations delete\npx session list\npx session get \u003csession-id\u003e\npx session annotate \u003csession-id\u003e\npx session add-note \u003csession-id\u003e\npx session-annotations delete\npx dataset list\npx dataset get \u003cname\u003e\npx project list\npx project get \u003cname\u003e\npx annotation-config list\npx auth status\npx profile list\npx profile show [name]\npx profile create \u003cname\u003e\npx profile use \u003cname\u003e\npx profile edit \u003cname\u003e\npx profile delete \u003cname\u003e\n```\n\n## Setup\n\n```bash\nexport PHOENIX_HOST=http://localhost:6006\nexport PHOENIX_PROJECT=my-project\nexport PHOENIX_API_KEY=your-api-key  # if auth is enabled\n```\n\nAlways use `--format raw --no-progress` when piping to `jq`.\n\n## Quick Reference\n\n| Task | Files |\n| ---- | ----- |\n| Look at sampled traces, spans, or sessions and write specific notes about what went wrong (no taxonomy yet) | [references/open-coding](references/open-coding.md) |\n| Group those notes into a structured failure taxonomy and quantify what matters | [references/axial-coding](references/axial-coding.md) |\n\nBoth stages tag every artifact with one shared **coding annotation identifier** (descriptive shape, e.g. `coding-run:chatbot-context-loss-2026-05-06`) so the run is queryable, reversible, and viewable as a unit. Pass `--identifier \u003cvalue\u003e` explicitly on every `px` call — shell inheritance is unreliable across agent harnesses. Open coding writes notes via `px ... add-note` and records a small local JSONL sidecar at `.px/coding/\u003csanitized-identifier\u003e.jsonl`; axial coding reads that sidecar as the deterministic handoff and records labels in `.px/coding/\u003csanitized-identifier\u003e-axial.jsonl`. Pick the identifier once per run (see [references/open-coding.md](references/open-coding.md#coding-annotation-identifier-pick-this-first)), then share the Phoenix UI link from the wrap-up section. Revert is opt-in and runs three identifier-bound DELETEs only after explicit user confirmation.\n\n\u003e **Workflow term vs. server annotation name.** The skill prose calls this value the **coding annotation identifier** (shell-variable hint: `CODING_ANNOTATION_IDENTIFIER`). The server-side annotation NAME used for the UI filter is unchanged — `coding_session_id` — for data compatibility with rows already written by previous runs. Don't try to rename the server-side annotation; treat the asymmetry as load-bearing.\n\n## Workflows\n\n**\"What do I do after instrumenting?\" / \"Where do I focus?\" / \"What's going wrong?\"**\n[open-coding](references/open-coding.md) → [axial-coding](references/axial-coding.md) → build evals for the top categories.\n\n## Reference Categories\n\n| Prefix | Description |\n| ------ | ----------- |\n| `references/open-coding` | Free-form notes against sampled traces, spans, or sessions — reach for it whenever the user wants to make sense of LLM traffic but has no failure categories yet. Includes a unit-of-analysis diagnostic so the workflow runs at the level the failure modes actually live at (trace for stateless single-shot calls, session for multi-turn agents, span for mechanical/in-isolation failures). |\n| `references/axial-coding` | Inductive grouping of notes into a MECE taxonomy with counts — reach for it whenever the user has observations and needs categories or eval targets |\n\n## Auth\n\n```bash\npx auth status                                # check connection and authentication\npx auth status --endpoint http://other:6006   # check a specific endpoint\npx auth status --profile staging              # check a named profile's connection\n```\n\n## Profiles\n\nNamed profiles let you switch between multiple Phoenix instances (local, staging, cloud) without juggling environment variables. Profiles are stored in `~/.px/settings.json` (or `$XDG_CONFIG_HOME/px/settings.json`).\n\nConfiguration priority (highest to lowest): CLI flags \u003e env vars \u003e active profile \u003e built-in defaults.\n\n```bash\npx profile list                              # list all profiles (shows active profile)\npx profile show                              # show the active profile's settings\npx profile show staging                      # show a named profile's settings\npx profile create prod --endpoint https://app.phoenix.arize.com --api-key \u003ckey\u003e --activate\npx profile create local --endpoint http://localhost:6006 --project my-app\npx profile use prod                          # switch the active profile\npx profile edit prod                         # open profile JSON in $EDITOR (validates on save)\npx profile delete prod --yes                 # delete a profile (--yes skips confirmation)\n```\n\nUse `--profile \u003cname\u003e` on any command to target a specific profile without changing the active one:\n\n```bash\npx trace list --profile staging --limit 10 --format raw --no-progress | jq .\npx auth status --profile prod\n```\n\n`px profile create` options: `--endpoint \u003curl\u003e`, `--project \u003cname\u003e`, `--api-key \u003ckey\u003e`, `--header \u003ckey=value\u003e` (repeatable), `--activate`.\n\n## Projects\n\n```bash\npx project list                                            # list all projects (table view)\npx project list --format raw --no-progress | jq '.[].name' # project names as JSON\npx project get my-project --format raw --no-progress       # single record by exact name\npx project get my-project --format raw --no-progress | jq -r '.id'  # extract project id\n```\n\n`project get` exits with `ExitCode.FAILURE` (1) on a name miss and writes a `StructuredError` `{error, code: \"FAILURE\", hint}` to stderr in `--format json|raw`.\n\n## Traces\n\n```bash\npx trace list --limit 20 --format raw --no-progress | jq .\npx trace list --last-n-minutes 60 --limit 20 --format raw --no-progress | jq '.[] | select(.status == \"ERROR\")'\npx trace list --since 2025-01-15T00:00:00Z --limit 50 --format raw --no-progress | jq .\npx trace list --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'\npx trace list --include-notes --format raw --no-progress | jq '.[].notes'\npx trace get \u003ctrace-id\u003e --format raw | jq .\npx trace get \u003ctrace-id\u003e --format raw | jq '.spans[] | select(.status_code != \"OK\")'\npx trace get \u003ctrace-id\u003e --include-notes --format raw | jq '.notes'\npx trace annotate \u003ctrace-id\u003e --name reviewer --label pass\npx trace annotate \u003ctrace-id\u003e --name reviewer --score 0.9 --format raw --no-progress\npx trace annotate \u003ctrace-id\u003e --name reviewer --label pass --identifier \"\u003ccoding-annotation-id\u003e\"  # tag with a coding annotation identifier\npx trace add-note \u003ctrace-id\u003e --text \"needs follow-up\"\npx trace add-note \u003ctrace-id\u003e --text \"needs follow-up\" --identifier \"\u003ccoding-annotation-id\u003e\"  # tag + upsert on identifier\npx trace-annotations delete --identifier \"\u003ccoding-annotation-id\u003e\" --all -y            # nuke every annotation tied to this coding annotation identifier\n```\n\n`px \u003centity\u003e-annotations delete` requires `--all` or both `--start-time` and `--end-time` and emits `{deleted: true, target, filter}` on success.\n\n### Trace JSON shape\n\n```\nTrace\n  traceId, status (\"OK\"|\"ERROR\"), duration (ms), startTime, endTime\n  annotations[] (with --include-annotations, excludes note)\n    name, result { score, label, explanation }\n  notes[] (with --include-notes)\n    name=\"note\", result { explanation }\n  rootSpan  — top-level span (parent_id: null)\n  spans[]\n    name, span_kind (\"LLM\"|\"CHAIN\"|\"TOOL\"|\"RETRIEVER\"|\"EMBEDDING\"|\"AGENT\"|\"RERANKER\"|\"GUARDRAIL\"|\"EVALUATOR\"|\"UNKNOWN\")\n    status_code (\"OK\"|\"ERROR\"|\"UNSET\"), parent_id, context.span_id\n    notes[] (with --include-notes)\n      name=\"note\", result { explanation }\n    attributes\n      input.value, output.value          — raw input/output\n      llm.model_name, llm.provider\n      llm.token_count.prompt/completion/total\n      llm.token_count.prompt_details.cache_read\n      llm.token_count.completion_details.reasoning\n      llm.input_messages.{N}.message.role/content\n      llm.output_messages.{N}.message.role/content\n      llm.invocation_parameters          — JSON string (temperature, etc.)\n      exception.message                  — set if span errored\n```\n\n## Spans\n\n```bash\npx span list --limit 20                                    # recent spans (table view)\npx span list --last-n-minutes 60 --limit 50                # spans from last hour\npx span list --since 2025-01-15T00:00:00Z --limit 50       # spans since a timestamp\npx span list --span-kind LLM --limit 10                    # only LLM spans\npx span list --status-code ERROR --limit 20                # only errored spans\npx span list --name chat_completion --limit 10             # filter by span name\npx span list --trace-id \u003cid\u003e --format raw --no-progress | jq .   # all spans for a trace\npx span list --parent-id null --limit 10                   # only root spans\npx span list --parent-id \u003cspan-id\u003e --limit 10              # only children of a span\npx span list --include-annotations --limit 10              # include annotation scores\npx span list --include-notes --limit 10                    # include span notes\npx span list --attribute llm.model_name:gpt-4 --limit 10  # filter by string attribute\npx span list --attribute llm.token_count.total:500 --limit 10  # filter by numeric attribute\npx span list --attribute 'user.id:\"12345\"' --limit 10     # force string match for numeric-looking value\npx span list --attribute session.id:sess:abc:123 --limit 20  # colon in value OK (split on first colon only)\npx span list --attribute llm.model_name:gpt-4 --attribute session.id:abc --limit 10  # AND multiple filters\npx span list output.json --limit 100                       # save to JSON file\npx span list --format raw --no-progress | jq '.[] | select(.status_code == \"ERROR\")'\npx span annotate \u003cspan-id\u003e --name reviewer --label pass\npx span annotate \u003cspan-id\u003e --name checker --score 1 --annotator-kind CODE\npx span annotate \u003cspan-id\u003e --name reviewer --label pass --identifier \"\u003ccoding-annotation-id\u003e\"  # tag with a coding annotation identifier\npx span add-note \u003cspan-id\u003e --text \"verified by agent\"\npx span add-note \u003cspan-id\u003e --text \"verified by agent\" --identifier \"\u003ccoding-annotation-id\u003e\"  # tag + upsert on identifier\npx span-annotations delete --identifier \"\u003ccoding-annotation-id\u003e\" --all -y           # nuke every annotation tied to this coding annotation identifier\n```\n\n### Span JSON shape\n\n```\nSpan\n  name, span_kind (\"LLM\"|\"CHAIN\"|\"TOOL\"|\"RETRIEVER\"|\"EMBEDDING\"|\"AGENT\"|\"RERANKER\"|\"GUARDRAIL\"|\"EVALUATOR\"|\"UNKNOWN\")\n  status_code (\"OK\"|\"ERROR\"|\"UNSET\"), status_message\n  context.span_id, context.trace_id, parent_id\n  start_time, end_time\n  attributes\n    input.value, output.value          — raw input/output\n    llm.model_name, llm.provider\n    llm.token_count.prompt/completion/total\n    llm.input_messages.{N}.message.role/content\n    llm.output_messages.{N}.message.role/content\n    llm.invocation_parameters          — JSON string (temperature, etc.)\n    exception.message                  — set if span errored\n  annotations[] (with --include-annotations, excludes note)\n    name, result { score, label, explanation }\n  notes[] (with --include-notes)\n    name=\"note\", result { explanation }\n```\n\n## Sessions\n\n```bash\npx session list --limit 10 --format raw --no-progress | jq .\npx session list --order asc --format raw --no-progress | jq '.[].session_id'\npx session list --include-annotations --include-notes --format raw --no-progress | jq '.[].notes'\npx session get \u003csession-id\u003e --format raw | jq .\npx session get \u003csession-id\u003e --include-annotations --format raw | jq '.session.annotations'\npx session get \u003csession-id\u003e --include-notes --format raw | jq '.session.notes'\npx session annotate \u003csession-id\u003e --name reviewer --label pass\npx session annotate \u003csession-id\u003e --name reviewer --score 0.9 --format raw --no-progress\npx session annotate \u003csession-id\u003e --name reviewer --label pass --identifier \"\u003ccoding-annotation-id\u003e\"  # tag with a coding annotation identifier\npx session add-note \u003csession-id\u003e --text \"verified by agent\"\npx session add-note \u003csession-id\u003e --text \"verified by agent\" --identifier \"\u003ccoding-annotation-id\u003e\"  # tag + upsert on identifier\npx session-annotations delete --identifier \"\u003ccoding-annotation-id\u003e\" --all -y              # nuke every annotation tied to this coding annotation identifier\n```\n\n### Session JSON shape\n\n```\nSessionData\n  id, session_id, project_id\n  start_time, end_time\n  token_count_prompt, token_count_completion, token_count_total  — cumulative across all LLM spans in the session (int, default 0)\n  annotations[] (with --include-annotations, excludes note)\n    name, result { score, label, explanation }\n  notes[] (with --include-notes)\n    name=\"note\", result { explanation }\n  traces[]\n    id, trace_id, start_time, end_time\n```\n\n## Datasets / Experiments / Prompts\n\n```bash\npx dataset list --format raw --no-progress | jq '.[].name'\npx dataset get \u003cname\u003e --format raw | jq '.examples[] | {input, output: .expected_output}'\npx dataset get \u003cname\u003e --split train --format raw | jq .    # filter by split\npx dataset get \u003cname\u003e --version \u003cversion-id\u003e --format raw | jq .\npx experiment list --dataset \u003cname\u003e --format raw --no-progress | jq '.[] | {id, name, failed_run_count}'\npx experiment get \u003cid\u003e --format raw --no-progress | jq '.[] | select(.error != null) | {input, error}'\npx prompt list --format raw --no-progress | jq '.[].name'\npx prompt get \u003cname\u003e --format text --no-progress   # plain text, ideal for piping to AI\n```\n\n## Annotation Configs\n\n```bash\npx annotation-config list                                           # list all configs (table view)\npx annotation-config list --format raw --no-progress | jq '.[].name' # config names as JSON\n```\n\n## GraphQL\n\nFor ad-hoc queries not covered by the commands above. Output is `{\"data\": {...}}`.\n\n```bash\npx api graphql '{ projectCount datasetCount promptCount evaluatorCount }'\npx api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | jq '.data.projects.edges[].node'\npx api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | jq '.data.datasets.edges[].node'\npx api graphql '{ evaluators { edges { node { name kind } } } }' | jq '.data.evaluators.edges[].node'\n\n# Introspect any type\npx api graphql '{ __type(name: \"Project\") { fields { name type { name } } } }' | jq '.data.__type.fields[]'\n```\n\nKey root fields: `projects`, `datasets`, `prompts`, `evaluators`, `projectCount`, `datasetCount`, `promptCount`, `evaluatorCount`, `viewer`.\n\n## Docs\n\nDownload Phoenix documentation markdown for local use by coding agents.\n\n```bash\npx docs fetch                                # fetch default workflow docs to .px/docs\npx docs fetch --workflow tracing             # fetch only tracing docs\npx docs fetch --workflow tracing --workflow evaluation\npx docs fetch --dry-run                      # preview what would be downloaded\npx docs fetch --refresh                      # clear .px/docs and re-download\npx docs fetch --output-dir ./my-docs         # custom output directory\n```\n\nKey options: `--workflow` (repeatable, values: `tracing`, `evaluation`, `datasets`, `prompts`, `integrations`, `sdk`, `self-hosting`, `all`), `--dry-run`, `--refresh`, `--output-dir` (default `.px/docs`), `--workers` (default 10).\n","references/axial-coding.md":"# Axial Coding\n\nGroup open-ended observations into structured failure taxonomies. Axial coding turns notes, trace observations, or open-coding output into named categories with counts, supporting downstream work like eval design and fix prioritization. It works well after [open coding](open-coding.md), but can start from any set of open-ended observations.\n\n**Reach for this whenever** the user has observations and needs structure — e.g., \"what categories of failures do we have\", \"what should I build evals for\", \"how do I prioritize fixes\", \"group these notes\", \"MECE breakdown\", or any framing that asks for categories or counts grounded in real traces rather than invented top-down.\n\n## Coding annotation identifier (reuse the open-coding value)\n\nReuse the **coding annotation identifier** chosen in open coding — every `annotate` call below passes `--identifier \"$CODING_ANNOTATION_IDENTIFIER\"` explicitly. In a fresh shell or fresh agent invocation, set `CODING_ANNOTATION_IDENTIFIER` to the same value (recoverable from the wrap-up UI URL or by listing `.px/coding/*.jsonl`); don't mint a new id. See [open-coding.md#coding-annotation-identifier-pick-this-first](open-coding.md#coding-annotation-identifier-pick-this-first) for the rationale and the sanitization rule.\n\n\u003e **Workflow term vs. server annotation name.** The skill calls this value the **coding annotation identifier**; the server annotation NAME used for the UI filter stays `coding_session_id` for data compatibility. Don't try to rename the server-side key.\n\n```bash\nCODING_ANNOTATION_IDENTIFIER=\"coding-run:chatbot-context-loss-2026-05-06\"\nSLUG=$(echo -n \"$CODING_ANNOTATION_IDENTIFIER\" | sed 's/[^a-zA-Z0-9_-]/-/g')\nNOTES_SIDECAR=\".px/coding/${SLUG}.jsonl\"\nAXIAL_SIDECAR=\".px/coding/${SLUG}-axial.jsonl\"\n```\n\n## Choosing the unit\n\nOpen coding's diagnostic in [open-coding.md#choosing-the-unit-of-analysis](open-coding.md#choosing-the-unit-of-analysis) commits to a unit (trace, span, or session). Axial coding inherits that unit by default — if open coding ran at the session level, axial labels will too; same for trace and span.\n\n**An axial label can live at a different level than the note that informed it** — that's a feature, and it works in every direction:\n\n- *Trace → span*: a trace-level note \"answered shipping when asked about returns\" can produce a span-level annotation on the retrieval span once a pattern reveals retrieval as the consistent culprit.\n- *Trace → session*: a batch of trace-level notes describing single-turn confusion can produce a session-level annotation once you see the pattern is \"the agent doesn't track the user's stated context across turns.\"\n- *Session → trace*: a session-level note about cross-turn drift may, on closer reading, attribute to one specific turn where the agent dropped the thread; a trace-level annotation can name that turn.\n\nWhichever level you write the axial label on, write the matching `coding_session_id` UI-filter annotation on the same entity (see [UI-filter annotation](#ui-filter-annotation) below) so the UI link picks it up.\n\n## Process\n\n1. **Set the coding annotation identifier** — set `CODING_ANNOTATION_IDENTIFIER` to the value used in open coding and re-derive `SLUG`, `NOTES_SIDECAR`, `AXIAL_SIDECAR` (see [Coding annotation identifier](#coding-annotation-identifier-reuse-the-open-coding-value))\n2. **Gather** — read open-coding notes from `$NOTES_SIDECAR` (at the unit committed in open coding); no server round-trip\n3. **Pattern** — group notes with common themes\n4. **Name** — create actionable category names\n5. **Attribute** — decide what level each category lives at; an axial label can move up (trace → session) or down (trace → span) from the source note's level to the level the pattern actually implicates\n6. **Record** — `px {trace,span,session} annotate ... --name axial_coding_category --label \u003ccat\u003e --identifier \"$CODING_ANNOTATION_IDENTIFIER\"`, add/update one JSONL sidecar row for the label, then write the matching `coding_session_id` UI-filter annotation\n7. **Quantify** — count failures per category from `$AXIAL_SIDECAR`\n\n## Example Taxonomy\n\n```yaml\nfailure_taxonomy:\n  content_quality:\n    hallucination: [invented_facts, fictional_citations]\n    incompleteness: [partial_answer, missing_key_info]\n    inaccuracy: [wrong_numbers, wrong_dates]\n\n  communication:\n    tone_mismatch: [too_casual, too_formal]\n    clarity: [ambiguous, jargon_heavy]\n\n  context:\n    user_context: [ignored_preferences, misunderstood_intent]\n    retrieved_context: [ignored_documents, wrong_context]\n\n  safety:\n    missing_disclaimers: [legal, medical, financial]\n```\n\n## Reading\n\n### 1. Gather — read this run's open-coding notes from the sidecar\n\nOpen-coding wrote one JSONL line per note to `$NOTES_SIDECAR` (`.px/coding/${SLUG}.jsonl`). Read it directly — no server round-trip is needed. Each line has `entity_kind`, `entity_id`, `note`, `identifier`, and `ts`. If the same `(entity_kind, entity_id)` appears more than once, use the newest `ts` as the current note.\n\n**Missing-file behavior.** An absent `$NOTES_SIDECAR` means open coding hasn't run for this coding annotation identifier in this CWD — stop and run open coding first, do not silently treat it as zero notes.\n\n**Malformed lines.** Each line is independently parseable JSON. If `jq` reports a parse error, fix or drop that line manually; do not edit other lines.\n\n**Notes outside this run.** The sidecar only carries notes this CWD wrote. To pull notes another reviewer or earlier run wrote, fetch them via `px {trace,span,session} list --include-notes` (embeds notes into row output) — the workflow's sidecar is intentionally per-CWD-per-coding-identifier.\n\n### 2. Group — synthesize categories\n\nReview the note text collected above. Manually identify recurring themes and draft candidate category names. Aim for MECE coverage: each note should fit exactly one category.\n\n### 3. Record — write axial-coding labels\n\nWrite one annotation per entity using `px {trace,span,session} annotate`, passing `--identifier \"$CODING_ANNOTATION_IDENTIFIER\"` explicitly on every call, and record one JSONL row in `$AXIAL_SIDECAR` so [Quantify](#4-quantify--count-per-category-from-the-axial-sidecar) below can count without a server round-trip. The level can differ from where the source note lives — see [Recording](#recording) below.\n\n### 4. Quantify — count per category from the axial sidecar\n\nCounts come from `$AXIAL_SIDECAR` (populated by [Record](#3-record--write-axial-coding-labels)). No server query, no project-wide history mixed in — the sidecar holds exactly the labels this run wrote. Count the current rows by `axial_label`; if an entity appears more than once, use the newest `ts`.\n\nSame missing-file and malformed-line rules as `$NOTES_SIDECAR`: a missing axial sidecar means no labels have been written yet (run [Record](#3-record--write-axial-coding-labels)); malformed lines are line-local — fix or drop, don't edit neighbors.\n\n## Recording\n\nUse the matching annotate command for the level the **label** belongs at — which may differ from where the source note lives (see [Choosing the unit](#choosing-the-unit)). Every call carries `--identifier \"$CODING_ANNOTATION_IDENTIFIER\"` and `--format raw --no-progress`, and is paired with a JSONL row in `$AXIAL_SIDECAR`.\n\n**Axial sidecar JSONL line shape (one per `annotate`):**\n\n```json\n{\"entity_kind\":\"trace\",\"entity_id\":\"\u003ctrace-id\u003e\",\"annotation_name\":\"axial_coding_category\",\"axial_label\":\"\u003clabel\u003e\",\"explanation\":\"\u003coptional explanation\u003e\",\"identifier\":\"\u003coriginal identifier value, unsanitized\u003e\",\"ts\":\"\u003cISO-8601 UTC\u003e\"}\n```\n\nFields:\n- `entity_kind` — `\"trace\"`, `\"span\"`, or `\"session\"` (matches the `annotate` subcommand)\n- `entity_id` — the entity argument passed to `annotate`\n- `annotation_name` — always `\"axial_coding_category\"` for axial labels (the workflow's reserved annotation name)\n- `axial_label` — the `--label` value, verbatim; this is what [Quantify](#4-quantify--count-per-category-from-the-axial-sidecar) groups on\n- `explanation` — optional, but include it when the `annotate` call used `--explanation`\n- `identifier` — the **original** `$CODING_ANNOTATION_IDENTIFIER` value, unsanitized; the sanitized form lives only in the filename\n- `ts` — ISO-8601 UTC timestamp of the local append\n\nIf you revise a label for the same entity under the same coding annotation identifier, either replace that row or append a newer row. When duplicate `(entity_kind, entity_id, annotation_name)` rows exist, the newest `ts` is the current label. This matches the server upsert behavior of `annotate --identifier`.\n\nMinimal trace example:\n\n```bash\npx trace annotate \u003ctrace-id\u003e \\\n  --name axial_coding_category \\\n  --label answered_off_topic \\\n  --explanation \"asked about returns; answer covered shipping\" \\\n  --annotator-kind HUMAN \\\n  --identifier \"$CODING_ANNOTATION_IDENTIFIER\" \\\n  --format raw --no-progress\n```\n\nThen add a matching JSONL row to `$AXIAL_SIDECAR` using the line shape above. For span or session labels, change `entity_kind`, `entity_id`, and the `px` subcommand accordingly.\n\nAccepted flags: `--name`, `--label`, `--score`, `--explanation`, `--annotator-kind` (`HUMAN`, `LLM`, `CODE`), `--identifier`. There is no `--sync` flag — the CLI passes `sync=true` itself.\n\n### UI-filter annotation\n\nWrite a `coding_session_id` annotation at the same level as the axial label — see [open-coding.md#ui-filter-annotation](open-coding.md#ui-filter-annotation) for why the Phoenix UI filter requires a name-based annotation rather than the bare `--identifier`. If open coding already wrote `coding_session_id` on the same entity, this call upserts (idempotent). The annotation NAME `coding_session_id` is unchanged; only the workflow's spoken term is \"coding annotation identifier\".\n\n```bash\n# Same level as the axial label above\npx trace annotate \u003ctrace-id\u003e \\\n  --name coding_session_id \\\n  --label \"$CODING_ANNOTATION_IDENTIFIER\" \\\n  --identifier \"$CODING_ANNOTATION_IDENTIFIER\"\n# or px span annotate / px session annotate at matching levels\n```\n\n### Recording discipline\n\nAxial coding categorizes the entities you took notes on during open coding. Use `$NOTES_SIDECAR` as the source of candidate entities and write labels only after reading the note text and surrounding trace/span/session context. Do **not** filter by `--status-code ERROR` — that captures only spans where Python raised, which excludes most failure modes (hallucination, wrong tone, retrieval miss). See [open-coding.md](open-coding.md#inspection) for the full reasoning.\n\n**Fallback paths:** REST `POST /v1/{trace,span,session}_annotations` and `@arizeai/phoenix-client`'s `addSpanAnnotation` / `addSessionAnnotation` (no `addTraceAnnotation` is exported today — use REST or `px trace annotate`). The GraphQL endpoint rejects mutations.\n\n## Wrapping up\n\nAfter axial coding finishes, share the Phoenix UI link with the user. The link points to the project's traces table filtered by the `coding_session_id` annotation — `annotations['coding_session_id'].label == '\u003ccoding-annotation-id\u003e'`. The UI route `/projects/:projectId` expects an encoded GraphQL node ID, not a project name — resolve it via `px project get`:\n\n```bash\nproject_id=$(px project get \"$PHOENIX_PROJECT\" --format raw --no-progress | jq -r '.id')\nencoded=$(python3 -c 'import urllib.parse, sys; print(urllib.parse.quote(sys.argv[1]))' \\\n  \"annotations['coding_session_id'].label == '$CODING_ANNOTATION_IDENTIFIER'\")\necho \"Phoenix UI: $PHOENIX_HOST/projects/$project_id/traces?filterCondition=$encoded\"\n```\n\nIf the user wants to discard everything this run produced (open-coding notes, axial-coding labels, and `coding_session_id` annotations on the server, plus the local sidecars), three identifier-bound deletes handle the server side and one `rm` handles the local sidecars. **Confirm before running** — destructive. Each `px \u003centity\u003e-annotations delete` call requires `--all` to authorize the unbounded sweep; `--identifier` only narrows. Set `PHOENIX_CLI_DANGEROUSLY_ENABLE_DELETES=true` first if not already exported:\n\n```bash\nfor kind in trace span session; do\n  px \"$kind-annotations\" delete \\\n    --identifier \"$CODING_ANNOTATION_IDENTIFIER\" \\\n    --all -y \\\n    --format raw --no-progress\ndone\nrm -f \"$NOTES_SIDECAR\" \"$AXIAL_SIDECAR\"\n```\n\nEach `px \u003centity\u003e-annotations delete` call removes notes, axial-coding labels, and `coding_session_id` annotations together because they share the underlying annotation table; the `rm` clears the local sidecars.\n\n## Agent Failure Taxonomy\n\n```yaml\nagent_failures:\n  planning: [wrong_plan, incomplete_plan]\n  tool_selection: [wrong_tool, missed_tool, unnecessary_call]\n  tool_execution: [wrong_parameters, type_error]\n  state_management: [lost_context, stuck_in_loop]\n  error_recovery: [no_fallback, wrong_fallback]\n```\n\n### Transition Matrix — jq sketch\n\nTo find where failures occur between agent states, identify the last non-error span before each first-error span within a trace. Note: OTel leaves most spans at `status_code == \"UNSET\"` and only sets `\"OK\"` when code explicitly does so — match `!= \"ERROR\"` rather than `== \"OK\"` so the matrix works on typical OTel data.\n\n```bash\npx span list --format raw --no-progress | jq '\n  group_by(.context.trace_id)\n  | map(\n      sort_by(.start_time)\n      | { trace_id: .[0].context.trace_id,\n          last_non_error: map(select(.status_code != \"ERROR\")) | last | .name,\n          first_err:      map(select(.status_code == \"ERROR\")) | first | .name }\n    )\n  | [ .[] | select(.first_err != null) ]\n  | group_by([.last_non_error, .first_err])\n  | map({ transition: \"\\(.[0].last_non_error) → \\(.[0].first_err)\", count: length })\n  | sort_by(-.count)\n'\n```\n\nUse the output to tally which state-to-state transitions are most failure-prone and add them to your taxonomy.\n\n## What Makes a Good Category\n\nA useful category is:\n- **Named for the cause**, not the symptom (\"wrong_tool_selected\", not \"bad_output\")\n- **Tied to a fix** — if you can't name a remediation, the category is too vague\n- **Grounded in data** — emerged from actual note text, not assumed upfront\n\n## Principles\n\n- **One coding annotation identifier per run** — every `annotate` call and every sidecar line carries `$CODING_ANNOTATION_IDENTIFIER`, the same value open coding used; never mint a new id mid-run.\n- **Pass `--identifier` explicitly** — every `px` call gets `--identifier \"$CODING_ANNOTATION_IDENTIFIER\"`; do not rely on inherited env vars.\n- **Sidecar reads, server writes** — Gather and Quantify read `$NOTES_SIDECAR` and `$AXIAL_SIDECAR` locally; Record writes to the server and updates the sidecar. If an entity appears more than once, the newest `ts` wins.\n- **MECE** — Each failure fits ONE category.\n- **Actionable** — Categories suggest fixes.\n- **Bottom-up** — Let categories emerge from data.\n- **UI-filter annotation always paired** — never write `axial_coding_category` without writing the matching `coding_session_id` annotation; the UI link depends on it.\n","references/open-coding.md":"# Open Coding\n\nFree-form note-writing against sampled traces, spans, or sessions, before any taxonomy exists. After you pick a sample at the right unit (see [Choosing the unit of analysis](#choosing-the-unit-of-analysis)), read each one and write a short, specific observation of what went wrong. These raw notes feed [axial coding](axial-coding.md), where they get grouped into named failure categories — and ultimately into eval targets or fix priorities.\n\n**Reach for this whenever** the user wants to look at LLM traffic without a fixed taxonomy yet — e.g., \"what's going wrong with this agent\", \"I just instrumented my app, where do I start\", \"review these traces\", \"the chatbot keeps losing context\", \"what kinds of mistakes is the model making\", \"help me make sense of these conversations\", or any framing that needs grounded observations before categories.\n\n## Choosing the unit of analysis\n\nThe right unit — **trace, span, or session** — depends on the question and the system. Pick deliberately before recording; the choice determines whether you call `px trace`, `px span`, or `px session` throughout, and a wrong default is expensive to undo mid-run.\n\nThe unit is about **where the failure modes you're investigating actually live**:\n\n- **Trace** — one input → one call graph → one output. Right for classifiers, single-shot summarizers, stateless tool-using agents, single-query RAG. Failure modes that live here: wrong answer, malformed output, missed retrieval, bad tool selection within one request.\n- **Span** — one operation inside a trace. Right for in-isolation mechanical failures (an exception fired, a tool returned an error response, an output is malformed) or when you can attribute on sight to a specific component. Reach for span when the trace as a whole is fine but one piece inside it is the unit of interest.\n- **Session** — a sequence of traces sharing a `session.id`. Right for multi-turn conversational agents, agents with episodic memory, anything where the failure mode is a *trajectory*: context loss across turns, drift from the user's stated goal, the agent forgetting a stated preference, repeated user clarifications. These failures don't exist on any single trace; they only exist *across* traces.\n\n### Diagnostic — three signals to read\n\n1. **User framing.** *Tilts session*: \"conversation\", \"agent forgot\", \"drift\", \"memory\", \"across turns\", \"user had to repeat themselves\". *Tilts trace*: \"this trace\", \"this call\", \"the response was wrong\", \"wrong output\". *Tilts span*: \"exception\", \"error response\", \"malformed\", \"the retrieval failed\".\n\n2. **Data shape.** Probe before the loop. The session id lives at `rootSpan.attributes[\"session.id\"]` (it is *not* a top-level field on the trace JSON), and is `\"\"` for traces that aren't session-wired — filter both:\n\n   ```bash\n   px trace list --limit 200 --format raw --no-progress \\\n     | jq '\n       [ .[] | .rootSpan.attributes[\"session.id\"] // empty | select(. != \"\") ]\n       | { with_session: length,\n           distinct_sessions: (group_by(.) | length),\n           median_traces_per_session:\n             (group_by(.) | map(length) | sort | .[length/2|floor] // 0) }\n     '\n   ```\n\n   `with_session: 0` → sessions not wired; trace is the grain. `median_traces_per_session: 1` → single-trace sessions; still trace. `median_traces_per_session: 5+` → sessions are meaningful; session is plausibly right.\n\n3. **System type.** Open one recent trace and inspect the root span's input. A single user message → one turn or one shot. A message *array* (`[{role: user}, {role: assistant}, ...]`) → that's a turn within a longer dialogue; the dialogue lives at the session level.\n\n   ```bash\n   px trace get \u003ctrace-id\u003e --format raw \\\n     | jq '.rootSpan.attributes[\"input.value\"] | (try fromjson catch .) | (type, length?)'\n   ```\n\n### Commit out loud, then proceed\n\nState the unit explicitly before recording any note:\n\n\u003e \"Question: 'the chatbot keeps losing context'. Data: median 7 traces per session, message-array inputs. Recording at the **session** level; will drop to **trace** for single-turn observations, **span** for mechanical failures.\"\n\nThe unit can shift if data demands it — a trace-level investigation that surfaces \"the agent never remembers earlier turns\" should pivot to session. Record the observation, then refocus the next batch. The unit is a starting hypothesis, not a contract.\n\n## Coding annotation identifier (pick this first)\n\nEvery artifact this workflow produces — open-coding notes, axial-coding labels, the local sidecar files, and the UI-filter annotation — is tagged with one **coding annotation identifier** so the run is queryable, revertible, and viewable as a unit. Pick a **descriptive, unique** identifier before recording any notes. Format suggestion:\n\n    coding-run:\u003cshort-topic\u003e-\u003cYYYY-MM-DD\u003e\n\nExamples: `coding-run:chatbot-context-loss-2026-05-06`, `coding-run:agent-tool-misuse-q2`. Descriptive ids carry meaning for whoever opens the data later — better than an opaque uuid. The `coding-run:` prefix is a visual convention; the value is the workflow's coding annotation identifier, not a `px session` id.\n\n\u003e **Workflow term vs. server annotation name.** The skill calls this value the **coding annotation identifier**. The server-side annotation NAME used for the UI filter is unchanged — `coding_session_id` — for data compatibility with rows already written. Don't try to rename it.\n\nPass the identifier explicitly on every `px` call. A shell variable for readability is fine, but **do not rely on shell inheritance** — many agent harnesses spawn each command in a fresh subshell, so `CODING_ANNOTATION_IDENTIFIER` may not propagate.\n\n```bash\nCODING_ANNOTATION_IDENTIFIER=\"coding-run:chatbot-context-loss-2026-05-06\"\n```\n\nThe local sidecar lives at `.px/coding/\u003csanitized-identifier\u003e.jsonl` (CWD-relative, matching the `.px/docs` precedent). Sanitization rule: replace any character not matching `[a-zA-Z0-9_-]` with `-` before using the value in the filename — colons, slashes, and other shell-fragile characters get normalized. For `CODING_ANNOTATION_IDENTIFIER=\"coding-run:chatbot-context-loss-2026-05-06\"` the sidecar path is `.px/coding/coding-run-chatbot-context-loss-2026-05-06.jsonl`.\n\nVerify this run hasn't already started — uniqueness is a **local file check**, not a server query:\n\n```bash\nSLUG=$(echo -n \"$CODING_ANNOTATION_IDENTIFIER\" | sed 's/[^a-zA-Z0-9_-]/-/g')\nSIDECAR=\".px/coding/${SLUG}.jsonl\"\ntest ! -f \"$SIDECAR\" || { echo \"Sidecar already exists at $SIDECAR — pick a new identifier or delete the file\"; exit 1; }\nmkdir -p .px/coding\n```\n\nIf `$SIDECAR` already exists, append a disambiguator (`-v2`, `-dustin`, etc.) to `CODING_ANNOTATION_IDENTIFIER`, re-derive `SLUG`, and re-check. The agent harness can run open coding and axial coding in independent invocations: each step re-derives `SLUG` from `CODING_ANNOTATION_IDENTIFIER` and reads/writes the same file.\n\n## Process\n\n1. **Pick a coding annotation identifier** — choose a descriptive value and verify the sidecar file does not yet exist (see [Coding annotation identifier](#coding-annotation-identifier-pick-this-first))\n2. **Pick the unit** — work through [Choosing the unit of analysis](#choosing-the-unit-of-analysis) and commit to trace, span, or session\n3. **Inspect** — fetch one entity at the chosen unit (trace / span / session)\n4. **Read** — input, output, exceptions, tool calls, retrieved context, and (at session level) the trajectory across child traces\n5. **Note** — write one specific sentence describing what went wrong (or skip if correct)\n6. **Record** — `px {trace,span,session} add-note \u003cid\u003e --text \"...\" --identifier \"$CODING_ANNOTATION_IDENTIFIER\" --format raw --no-progress`, add/update one JSONL sidecar row for the note, then write the matching [UI-filter annotation](#ui-filter-annotation)\n7. **Iterate** — move to the next entity; repeat until the sample is exhausted or saturation hits\n8. **Hand off** — axial coding reads the sidecar directly (no shared shell required); see [Wrapping up](#wrapping-up) for the UI link\n\n## Inspection\n\nUse `px` to read context at the unit committed in [Choosing the unit](#choosing-the-unit-of-analysis):\n\n- **Trace unit** — read one trace's input → tool calls → retrieved context → output as one story.\n- **Span unit** — read one operation's input/output and surrounding spans for context.\n- **Session unit** — read the sequence of traces in order; the trajectory (turns, retrievals, tool-call patterns *across* traces) is the data, not any single trace's inputs and outputs.\n\n\u003e **Don't filter the sample by `--status-code ERROR`.** OTel's `status_code` only flips to `ERROR` when an instrumentor catches a raised Python exception (network failure, 5xx, parse error). Hallucinations, wrong tone, retrieval misses, and bad tool selection all complete cleanly and arrive as `OK` or `UNSET`. Sampling for open coding by `--status-code ERROR` excludes the population this workflow exists to surface.\n\n```bash\n# Sample recent traces — the unit of inspection in open coding\npx trace list --limit 100 --format raw --no-progress | jq '\n  .[] | {trace_id: .traceId, root: .rootSpan.name, status,\n         input: .rootSpan.attributes[\"input.value\"],\n         output: .rootSpan.attributes[\"output.value\"]}\n'\n\n# Trace-level context — all spans in one trace, ordered by start_time\npx trace get \u003ctrace-id\u003e --format raw | jq '\n  .spans | sort_by(.start_time) | map({span_id: .context.span_id, name, status_code,\n    input: .attributes[\"input.value\"],\n    output: .attributes[\"output.value\"]})\n'\n\n# Drill to one span (px span get does not exist; filter via span list)\npx span list --trace-id \u003ctrace-id\u003e --format raw --no-progress \\\n  | jq '.[] | select(.context.span_id == \"\u003cspan-id\u003e\")'\n\n# Check existing notes on traces (default) or spans you are about to review\n# Notes are stored as annotations with name=\"note\"; use --include-notes (not --include-annotations)\npx trace list --include-notes --limit 10 --format raw --no-progress | jq '\n  .[] | select((.notes // []) | length \u003e 0)\n  | {trace_id: .traceId, notes: [.notes[] | .result.explanation]}\n'\n# Same shape on spans — swap px trace for px span and use .context.span_id\n```\n\nAlways pipe through `jq` with `--format raw --no-progress` when scripting.\n\n## Recording Notes\n\nUse the `add-note` command matching the unit committed in [Choosing the unit](#choosing-the-unit-of-analysis): `px trace add-note`, `px span add-note`, or `px session add-note`. Every call carries an explicit `--identifier \"$CODING_ANNOTATION_IDENTIFIER\"` and `--format raw --no-progress`.\n\nPassing `--identifier \"$CODING_ANNOTATION_IDENTIFIER\"` does two things:\n- Tags the note row with the coding annotation identifier on the server, so the cleanup `px \u003centity\u003e-annotations delete --identifier \"$CODING_ANNOTATION_IDENTIFIER\" --all` sweep removes every artifact this run produced.\n- Makes the call **upsert** on `(entity_id, name='note', identifier)` — re-running open coding on the same entity within the same coding annotation identifier overwrites the prior note instead of appending a second row. (Without `--identifier`, the server stamps a unique `px-{kind}-note:\u003cuuid\u003e` and each call appends.)\n\nAfter every successful `add-note`, record one JSONL line in `$SIDECAR`. The sidecar is what axial coding reads — no server round-trip. It is a content handoff, not code: keep it readable, inspect it directly, and use whatever simple tooling is convenient.\n\n**Sidecar JSONL line shape (one per `add-note`):**\n\n```json\n{\"entity_kind\":\"trace\",\"entity_id\":\"\u003ctrace-id\u003e\",\"note\":\"\u003ctext\u003e\",\"identifier\":\"\u003coriginal identifier value, unsanitized\u003e\",\"ts\":\"\u003cISO-8601 UTC\u003e\"}\n```\n\nFields:\n- `entity_kind` — `\"trace\"`, `\"span\"`, or `\"session\"` (matches the `add-note` subcommand used)\n- `entity_id` — the entity argument passed to `add-note` (trace id, span id, or session id)\n- `note` — the `--text` value, verbatim\n- `identifier` — the **original** `$CODING_ANNOTATION_IDENTIFIER` value, unsanitized; the sanitized form lives only in the filename\n- `ts` — ISO-8601 UTC timestamp (e.g. `2026-05-08T17:14:09Z`) of the local append\n\nIf you revise a note for the same entity under the same coding annotation identifier, either replace that row or append a newer row. When duplicate `(entity_kind, entity_id)` rows exist, the newest `ts` is the current note. This matches the server upsert behavior of `add-note --identifier`.\n\nMinimal trace example:\n\n```bash\npx trace add-note \u003ctrace-id\u003e \\\n  --text \"Asked about returns; final answer covered shipping policy instead\" \\\n  --identifier \"$CODING_ANNOTATION_IDENTIFIER\" \\\n  --format raw --no-progress\n```\n\nThen add a matching JSONL row to `$SIDECAR` using the line shape above. For span or session notes, change `entity_kind`, `entity_id`, and the `px` subcommand accordingly.\n\nBulk auto-tagging by status code (e.g. `px span list --status-code ERROR | xargs ... add-note \"error\"`) is **not open coding** — open coding is manual, observation-grounded, and ranges over all failure modes, not just spans where Python raised. Skip the bulk-by-status-code shortcut; it produces fewer, less informative notes than walking traces.\n\n### UI-filter annotation\n\nEvery entity that receives an open-coding note (or an axial-coding label later) also needs a UI-filter annotation so the Phoenix UI can filter by coding annotation identifier. Phoenix's UI filter language is name-based, not identifier-based — there is no UI primitive for filtering by `identifier`, so an annotation whose **name** is the constant `coding_session_id` and whose **label** is the coding annotation identifier value is what the wrap-up UI link actually filters on.\n\nThe annotation NAME `coding_session_id` is the load-bearing data key on the server and is **unchanged** in this rewrite. The skill's workflow term is \"coding annotation identifier\"; the server key stays `coding_session_id` for compatibility with rows already written.\n\nRun this once per touched entity, alongside the `add-note` (and again later when axial coding labels a different entity):\n\n```bash\npx trace annotate \u003ctrace-id\u003e \\\n  --name coding_session_id \\\n  --label \"$CODING_ANNOTATION_IDENTIFIER\" \\\n  --identifier \"$CODING_ANNOTATION_IDENTIFIER\"\n# or px span annotate / px session annotate at matching levels\n```\n\nThe annotation's `--identifier` matches `$CODING_ANNOTATION_IDENTIFIER`, so the [wrap-up DELETE](#wrapping-up) cleans it up in the same call as the notes and the axial-coding labels.\n\n**Fallback write paths (one-line asides):**\n\n- `POST /v1/trace_notes` and `POST /v1/span_notes` and `POST /v1/session_notes` — accept one `{data: {trace_id|span_id|session_id, note, identifier}}` per request; the optional `identifier` field upserts on `(entity_id, name='note', identifier)` when non-empty.\n- `@arizeai/phoenix-client` `addTraceNote`, `addSpanNote`, and `addSessionNote` wrap the same endpoints and accept an optional `identifier` field on the note object.\n- The GraphQL endpoint rejects mutations with `\"Only queries are permitted.\"` — write through `px {trace,span,session} add-note` or the REST endpoints above.\n\n## What Makes a Good Note\n\n| Weak note            | Why it's weak             | Good note                                                                  | Why it's strong                             |\n| -------------------- | ------------------------- | -------------------------------------------------------------------------- | ------------------------------------------- |\n| \"Wrong answer\"       | No observable detail      | \"Said the store closes at 6pm but policy is 9pm\"                           | Quotes observed vs. correct value           |\n| \"Bad tone\"           | Vague judgment            | \"Used first-name greeting for an enterprise support ticket\"                | Specifies the context mismatch              |\n| \"Hallucination\"      | Labels before observing   | \"Cited a product feature ('auto-renew') that does not exist in the schema\" | Describes what was fabricated               |\n| \"Retrieval issue\"    | Category, not observation | \"Retrieved docs about shipping when the question was about returns\"        | States what was retrieved vs. needed        |\n| \"Model confused\"     | Opaque                    | \"Answered in Spanish when the user wrote in English\"                       | Observable and reproducible                 |\n\nWrite what you saw, not the category you think it belongs to — categorization happens in [axial coding](axial-coding.md). Short prefixes like `TONE:` or `FACTUAL:` are a personal shorthand, not a repo convention.\n\n## Saturation\n\nStop writing notes when observations stop being new. Signals:\n\n- **Repeats** — the last 10–15 traces produced notes that describe failures you've already seen.\n- **Paraphrase convergence** — you catch yourself writing minor variations of earlier notes.\n- **Skips outnumber notes** — most recent traces are correct and need no note.\n\nAt saturation, move on to [axial coding](axial-coding.md) to group what you have. Continuing past saturation adds traces but not insight. You do not need to annotate every trace — annotating correct ones dilutes signal.\n\n## Listing what this run produced\n\nThe local sidecar is the handoff record for notes written this run. Inspect it directly. Each line is one note record; if the same entity appears more than once, use the newest `ts` as the current note. Missing-file behavior: an absent sidecar means open coding has not yet started for this coding annotation identifier; treat that as zero notes, not an error. Malformed lines are line-local: fix or drop the bad line without editing neighbors.\n\n## Wrapping up\n\nWhen the run is done, share the Phoenix UI link with the user. The link filters the project's traces page by the `coding_session_id` annotation written alongside each note. The UI route `/projects/:projectId` expects an encoded GraphQL node ID, not a project name — resolve it via `px project get`:\n\n```bash\nproject_id=$(px project get \"$PHOENIX_PROJECT\" --format raw --no-progress | jq -r '.id')\nencoded=$(python3 -c 'import urllib.parse, sys; print(urllib.parse.quote(sys.argv[1]))' \\\n  \"annotations['coding_session_id'].label == '$CODING_ANNOTATION_IDENTIFIER'\")\necho \"Phoenix UI: $PHOENIX_HOST/projects/$project_id/traces?filterCondition=$encoded\"\n```\n\nIf the user wants to discard everything this run produced, three identifier-bound deletes handle the server side and one `rm` handles the local sidecars. **Confirm with the user before running** — this is destructive. Each call requires `--all` (or both `--start-time` and `--end-time`) to authorize the sweep; `--identifier` filters further but never authorizes on its own. Set `PHOENIX_CLI_DANGEROUSLY_ENABLE_DELETES=true` first if not already exported:\n\n```bash\nfor kind in trace span session; do\n  px \"$kind-annotations\" delete \\\n    --identifier \"$CODING_ANNOTATION_IDENTIFIER\" \\\n    --all -y \\\n    --format raw --no-progress\ndone\nrm -f \"$SIDECAR\" \".px/coding/${SLUG}-axial.jsonl\"\n```\n\nEach `px \u003centity\u003e-annotations delete` call covers notes, structured annotations, and the `coding_session_id` annotation in one shot because they share the underlying annotation table.\n\n## Principles\n\n- **One coding annotation identifier per run** — every server artifact and every sidecar line carries the same `$CODING_ANNOTATION_IDENTIFIER`; never mint a per-stage id.\n- **Pass `--identifier` explicitly** — every `px` call gets `--identifier \"$CODING_ANNOTATION_IDENTIFIER\"`; do not rely on inherited env vars across harness-spawned subshells.\n- **Sidecar is the handoff record for notes** — axial coding reads from the local sidecar, not from the server; if an entity appears more than once, the newest `ts` wins.\n- **Free-form over structured** — do not pre-commit to a taxonomy during open coding; categories emerge in axial coding.\n- **Specific over general** — quote or paraphrase the observed failure; vague labels (\"bad response\") carry no signal.\n- **Context before labeling** — inspect input, output, and retrieved context before writing any note.\n- **Iterate before categorizing** — work through the full sample first; resist grouping while still collecting.\n- **Skip is valid** — a correct span needs no note; annotating everything dilutes signal.\n- **Revert is opt-in** — the wrap-up DELETE only runs after explicit user confirmation; the default path prints the UI link and stops.\n"},"import":{"commit_sha":"541b7819d8c3545c6df122491af4fa1eae415779","imported_at":"2026-05-18T20:05:35Z","license_text":"MIT License\n\nCopyright GitHub, Inc.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.","owner":"github","repo":"github/awesome-copilot","source_url":"https://github.com/github/awesome-copilot/tree/541b7819d8c3545c6df122491af4fa1eae415779/plugins/phoenix/skills/phoenix-cli"}},"content_hash":[32,139,90,207,153,157,187,153,104,75,82,225,70,37,182,195,154,38,36,79,158,34,3,8,219,255,222,51,194,122,98,104],"trust_level":"unsigned","yanked":false}