Security Documentation
Agent Safety
AI agents can read files, run commands, access the network, and interact with external services. Before trusting any agent definition, you need to understand exactly what it can do — and what could go wrong.
Why Agent Security Matters
Research from Cisco found that 12% of shared AI agent definitions on OpenClaw contain malicious content, with over 40,000 agents lacking basic security measures. Prompt injection attacks embedded in tool descriptions can hijack agent behavior, exfiltrate data, or execute arbitrary commands on your machine.
The OWASP Top 10 for Agentic Applications (December 2025) identifies critical risks including agent goal hijacking, tool misuse, supply chain attacks, and excessive permissions — all of which apply to shared agent definitions.
KillerSkills scans every agent definition with both static pattern analysis and AI-powered semantic review before making it available to users.
Risk Levels
Every agent definition receives a risk level based on its declared capabilities, permission scopes, and detected behaviors.
No tool access, informational only. The agent provides guidance, templates, or instructions but cannot execute any actions.
Example: A prompt template that structures how to ask an AI for code reviews.
Read-only tool access, no network. The agent can read local files and project structure but cannot modify anything or communicate externally.
Example: A code analysis agent that reads your codebase and provides suggestions.
Write access, network access, or shell commands. The agent can modify files, access the internet, or execute commands on your system.
Example: A deployment agent that pushes code and runs build commands.
Autonomous execution, credential handling, or multi-tool chaining. The agent operates with minimal human oversight and can access sensitive resources.
Example: An autonomous DevOps agent with SSH access, database credentials, and shell execution.
How KillerSkills Scans Agent Definitions
Static Pattern Analysis
Every agent definition is scanned against 60+ regex patterns covering XSS, command injection, data exfiltration, credential harvesting, prompt injection, and agent-specific threats like tool poisoning and supply chain risks.
AI Semantic Analysis
An AI model performs a thorough security review, analyzing the full content for hidden instructions, misleading descriptions, obfuscated payloads, and social engineering attempts that static patterns might miss.
Risk Profile Computation
A risk profile is computed across five dimensions: tool risk, network risk, data risk, autonomy risk, and supply chain risk. Each produces a 0-100 score with human-readable risk factors.
Classification
Based on the combined analysis, the agent receives a safety status (safe, warning, or dangerous), a safety score (0-100), and a risk level (low, medium, high, or critical).
OWASP Top 10 for Agentic Applications
Published December 2025, these are the most critical security risks for AI agent systems. KillerSkills scans for all of them.
- 01Prompt Injection — Malicious inputs that manipulate agent behavior, including indirect injection via tool outputs and cross-agent contamination.
- 02Tool Misuse — Agents invoking tools in unintended ways, such as executing destructive commands or accessing files outside scope.
- 03Agent Goal Hijacking — Attacks that redirect an agent from its intended task to perform actions beneficial to the attacker.
- 04Excessive Permissions — Agents granted more access than needed, violating the principle of least privilege.
- 05Insecure Tool Design — Tools that accept unsanitized inputs, enabling command injection or path traversal attacks.
- 06Supply Chain Vulnerabilities — Compromised dependencies, MCP servers, or packages that introduce malicious code.
- 07Data Exfiltration — Agents leaking sensitive data through network requests, tool outputs, or side channels.
- 08Inadequate Sandboxing — Agents running without proper isolation, allowing filesystem and network access beyond intent.
- 09Missing Human-in-the-Loop — Autonomous agents making consequential decisions without human approval checkpoints.
- 10Insufficient Logging — Lack of audit trails for agent actions, making incident response and forensics difficult.
Red Flags Checklist
When evaluating any third-party agent definition, watch for these warning signs:
- ✗Autonomous execution combined with shell access and network write
- ✗Pipe-to-shell patterns: curl or wget piped to bash, sh, python, or node
- ✗Raw IP addresses in external endpoints instead of domain names
- ✗MCP servers from unknown or non-well-known registries
- ✗Environment variables named SECRET, TOKEN, or KEY not marked as sensitive
- ✗npx invocation of packages you don't recognize
- ✗Access to sensitive paths: ~/.ssh, ~/.aws, .env, /etc/passwd
- ✗Hidden instructions in tool descriptions (tool poisoning)
- ✗Auto-update or dynamic fetch configurations that load code at runtime
- ✗More than 10 tools or permission scopes with no clear justification
Best Practices for Running Agents Safely
Use a sandbox
Run agents in containers, VMs, or sandboxed environments. Never give production credentials to an untested agent.
Apply least privilege
Grant only the minimum permissions needed. Remove shell_execute if the agent doesn't need it.
Keep humans in the loop
Require approval for destructive operations, file modifications, and network requests until you trust the agent.
Verify MCP servers
Only connect to MCP servers from official registries. Inspect the server source code before trusting it.
Review environment variables
Understand every env var the agent requests. Never provide credentials for services the agent shouldn't access.
Monitor agent actions
Enable logging for all agent actions. Review logs regularly and set up alerts for unexpected behaviors.
Resources
- OWASP Top 10 for Agentic Applications — Comprehensive risk framework for AI agent systems.
- NIST AI Risk Management Framework — Guidelines for managing AI system risks in organizational contexts.
- Model Context Protocol (MCP) Security Guide — Best practices for securing MCP server connections and tool access.