Security Documentation

Agent Safety

AI agents can read files, run commands, access the network, and interact with external services. Before trusting any agent definition, you need to understand exactly what it can do — and what could go wrong.

Why Agent Security Matters

Research from Cisco found that 12% of shared AI agent definitions on OpenClaw contain malicious content, with over 40,000 agents lacking basic security measures. Prompt injection attacks embedded in tool descriptions can hijack agent behavior, exfiltrate data, or execute arbitrary commands on your machine.

The OWASP Top 10 for Agentic Applications (December 2025) identifies critical risks including agent goal hijacking, tool misuse, supply chain attacks, and excessive permissions — all of which apply to shared agent definitions.

KillerSkills scans every agent definition with both static pattern analysis and AI-powered semantic review before making it available to users.

Risk Levels

Every agent definition receives a risk level based on its declared capabilities, permission scopes, and detected behaviors.

Low

No tool access, informational only. The agent provides guidance, templates, or instructions but cannot execute any actions.

Example: A prompt template that structures how to ask an AI for code reviews.

Medium

Read-only tool access, no network. The agent can read local files and project structure but cannot modify anything or communicate externally.

Example: A code analysis agent that reads your codebase and provides suggestions.

High

Write access, network access, or shell commands. The agent can modify files, access the internet, or execute commands on your system.

Example: A deployment agent that pushes code and runs build commands.

Critical

Autonomous execution, credential handling, or multi-tool chaining. The agent operates with minimal human oversight and can access sensitive resources.

Example: An autonomous DevOps agent with SSH access, database credentials, and shell execution.

How KillerSkills Scans Agent Definitions

Static Pattern Analysis

Every agent definition is scanned against 60+ regex patterns covering XSS, command injection, data exfiltration, credential harvesting, prompt injection, and agent-specific threats like tool poisoning and supply chain risks.

AI Semantic Analysis

An AI model performs a thorough security review, analyzing the full content for hidden instructions, misleading descriptions, obfuscated payloads, and social engineering attempts that static patterns might miss.

Risk Profile Computation

A risk profile is computed across five dimensions: tool risk, network risk, data risk, autonomy risk, and supply chain risk. Each produces a 0-100 score with human-readable risk factors.

Classification

Based on the combined analysis, the agent receives a safety status (safe, warning, or dangerous), a safety score (0-100), and a risk level (low, medium, high, or critical).

OWASP Top 10 for Agentic Applications

Published December 2025, these are the most critical security risks for AI agent systems. KillerSkills scans for all of them.

01
Prompt Injection — Malicious inputs that manipulate agent behavior, including indirect injection via tool outputs and cross-agent contamination.
02
Tool Misuse — Agents invoking tools in unintended ways, such as executing destructive commands or accessing files outside scope.
03
Agent Goal Hijacking — Attacks that redirect an agent from its intended task to perform actions beneficial to the attacker.
04
Excessive Permissions — Agents granted more access than needed, violating the principle of least privilege.
05
Insecure Tool Design — Tools that accept unsanitized inputs, enabling command injection or path traversal attacks.
06
Supply Chain Vulnerabilities — Compromised dependencies, MCP servers, or packages that introduce malicious code.
07
Data Exfiltration — Agents leaking sensitive data through network requests, tool outputs, or side channels.
08
Inadequate Sandboxing — Agents running without proper isolation, allowing filesystem and network access beyond intent.
09
Missing Human-in-the-Loop — Autonomous agents making consequential decisions without human approval checkpoints.
10
Insufficient Logging — Lack of audit trails for agent actions, making incident response and forensics difficult.

Red Flags Checklist

When evaluating any third-party agent definition, watch for these warning signs:

✗Autonomous execution combined with shell access and network write
✗Pipe-to-shell patterns: curl or wget piped to bash, sh, python, or node
✗Raw IP addresses in external endpoints instead of domain names
✗MCP servers from unknown or non-well-known registries
✗Environment variables named SECRET, TOKEN, or KEY not marked as sensitive
✗npx invocation of packages you don't recognize
✗Access to sensitive paths: ~/.ssh, ~/.aws, .env, /etc/passwd
✗Hidden instructions in tool descriptions (tool poisoning)
✗Auto-update or dynamic fetch configurations that load code at runtime
✗More than 10 tools or permission scopes with no clear justification

Best Practices for Running Agents Safely

Use a sandbox

Run agents in containers, VMs, or sandboxed environments. Never give production credentials to an untested agent.

Apply least privilege

Grant only the minimum permissions needed. Remove shell_execute if the agent doesn't need it.

Keep humans in the loop

Require approval for destructive operations, file modifications, and network requests until you trust the agent.

Verify MCP servers

Only connect to MCP servers from official registries. Inspect the server source code before trusting it.

Review environment variables

Understand every env var the agent requests. Never provide credentials for services the agent shouldn't access.

Monitor agent actions

Enable logging for all agent actions. Review logs regularly and set up alerts for unexpected behaviors.

Resources

OWASP Top 10 for Agentic Applications — Comprehensive risk framework for AI agent systems.
NIST AI Risk Management Framework — Guidelines for managing AI system risks in organizational contexts.
Model Context Protocol (MCP) Security Guide — Best practices for securing MCP server connections and tool access.

Start Learning About AI Agent Safety