{"kind":"AgentDefinition","metadata":{"namespace":"community","name":"agent-safety","version":"0.1.0"},"spec":{"agents_md":"---\ndescription: 'Guidelines for building safe, governed AI agent systems. Apply when writing code that uses agent frameworks, tool-calling LLMs, or multi-agent orchestration to ensure proper safety boundaries, policy enforcement, and auditability.'\napplyTo: '**'\n---\n\n# Agent Safety \u0026 Governance\n\n## Core Principles\n\n- **Fail closed**: If a governance check errors or is ambiguous, deny the action rather than allowing it\n- **Policy as configuration**: Define governance rules in YAML/JSON files, not hardcoded in application logic\n- **Least privilege**: Agents should have the minimum tool access needed for their task\n- **Append-only audit**: Never modify or delete audit trail entries — immutability enables compliance\n\n## Tool Access Controls\n\n- Always define an explicit allowlist of tools an agent can use — never give unrestricted tool access\n- Separate tool registration from tool authorization — the framework knows what tools exist, the policy controls which are allowed\n- Use blocklists for known-dangerous operations (shell execution, file deletion, database DDL)\n- Require human-in-the-loop approval for high-impact tools (send email, deploy, delete records)\n- Enforce rate limits on tool calls per request to prevent infinite loops and resource exhaustion\n\n## Content Safety\n\n- Scan all user inputs for threat signals before passing to the agent (data exfiltration, prompt injection, privilege escalation)\n- Filter agent arguments for sensitive patterns: API keys, credentials, PII, SQL injection\n- Use regex pattern lists that can be updated without code changes\n- Check both the user's original prompt AND the agent's generated tool arguments\n\n## Multi-Agent Safety\n\n- Each agent in a multi-agent system should have its own governance policy\n- When agents delegate to other agents, apply the most restrictive policy from either\n- Track trust scores for agent delegates — degrade trust on failures, require ongoing good behavior\n- Never allow an inner agent to have broader permissions than the outer agent that called it\n\n## Audit \u0026 Observability\n\n- Log every tool call with: timestamp, agent ID, tool name, allow/deny decision, policy name\n- Log every governance violation with the matched rule and evidence\n- Export audit trails in JSON Lines format for integration with log aggregation systems\n- Include session boundaries (start/end) in audit logs for correlation\n\n## Code Patterns\n\nWhen writing agent tool functions:\n```python\n# Good: Governed tool with explicit policy\n@govern(policy)\nasync def search(query: str) -\u003e str:\n    ...\n\n# Bad: Unprotected tool with no governance\nasync def search(query: str) -\u003e str:\n    ...\n```\n\nWhen defining policies:\n```yaml\n# Good: Explicit allowlist, content filters, rate limit\nname: my-agent\nallowed_tools: [search, summarize]\nblocked_patterns: [\"(?i)(api_key|password)\\\\s*[:=]\"]\nmax_calls_per_request: 25\n\n# Bad: No restrictions\nname: my-agent\nallowed_tools: [\"*\"]\n```\n\nWhen composing multi-agent policies:\n```python\n# Good: Most-restrictive-wins composition\nfinal_policy = compose_policies(org_policy, team_policy, agent_policy)\n\n# Bad: Only using agent-level policy, ignoring org constraints\nfinal_policy = agent_policy\n```\n\n## Framework-Specific Notes\n\n- **PydanticAI**: Use `@agent.tool` with a governance decorator wrapper. PydanticAI's upcoming Traits feature is designed for this pattern.\n- **CrewAI**: Apply governance at the Crew level to cover all agents. Use `before_kickoff` callbacks for policy validation.\n- **OpenAI Agents SDK**: Wrap `@function_tool` with governance. Use handoff guards for multi-agent trust.\n- **LangChain/LangGraph**: Use `RunnableBinding` or tool wrappers for governance. Apply at the graph edge level for flow control.\n- **AutoGen**: Implement governance in the `ConversableAgent.register_for_execution` hook.\n\n## Common Mistakes\n\n- Relying only on output guardrails (post-generation) instead of pre-execution governance\n- Hardcoding policy rules instead of loading from configuration\n- Allowing agents to self-modify their own governance policies\n- Forgetting to governance-check tool *arguments*, not just tool *names*\n- Not decaying trust scores over time — stale trust is dangerous\n- Logging prompts in audit trails — log decisions and metadata, not user content\n","description":"Guidelines for building safe, governed AI agent systems. Apply when writing code that uses agent frameworks, tool-calling LLMs, or multi-agent orchestration to ensure proper safety boundaries, policy enforcement, and auditability.","import":{"commit_sha":"541b7819d8c3545c6df122491af4fa1eae415779","imported_at":"2026-05-18T20:05:35Z","license_text":"MIT License\n\nCopyright GitHub, Inc.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.","owner":"github","repo":"github/awesome-copilot","source_url":"https://github.com/github/awesome-copilot/blob/541b7819d8c3545c6df122491af4fa1eae415779/instructions/agent-safety.instructions.md"},"manifest":{}},"content_hash":[125,140,46,104,129,9,210,138,74,175,39,46,149,12,251,189,253,119,37,52,88,0,95,5,36,40,117,239,15,122,27,211],"trust_level":"unsigned","yanked":false}
