Why You Need a Checklist
AI agents are shipping to production faster than security practices can keep up. In a 2026 survey by Anthropic, 67% of organizations deploying AI agents reported having no formal security review process for agent tools and skills. The OWASP LLM Top 10 now lists prompt injection as the number-one risk for LLM applications, and AI agents — with their tool access and autonomous behavior — amplify that risk by orders of magnitude.
This checklist covers every layer of AI agent security: pre-deployment scanning, runtime protection, fleet monitoring, and compliance. Each item is specific, actionable, and includes the exact command or configuration to implement it.
Before You Deploy
1. Scan all skills and tools
Every skill file (.md), MCP server source code, and agent config should be scanned for threats before deployment. The attack surface in skill files is broader than most developers expect. Look for:
- Prompt injection — hidden instructions that override system prompts, including encoded variants (base64, hex, ROT13, unicode homoglyphs)
- Credential exposure — hardcoded API keys, tokens, passwords, connection strings, or private keys embedded in tool definitions
- Data exfiltration patterns — outbound URLs, webhook services, DNS exfiltration via tool parameters
- Shell injection — command execution in tool definitions (
bash -c,eval(),exec(),subprocess.run()) - Deceptive descriptions — tools whose description says one thing but whose implementation does another
scandar scan ./skills/ --threshold 80 --fail-on critical
This command scans every file in the skills directory using scandar-scan's skill scanner, applies 140+ detection rules, and fails with a non-zero exit code if any finding is rated critical (trust score below 40). The threshold of 80 means any skill scoring below 80 will be flagged for review.
For CI/CD integration, use the JSON output format:
scandar scan ./skills/ --threshold 80 --fail-on critical --format json > scan-results.json
This produces machine-readable output you can parse in your pipeline, store as build artifacts, and track over time.
2. Scan your system prompts
Your system prompt is your first line of defense. It tells the model who it is, what it should do, and — critically — what it should refuse. A weak system prompt is like leaving your front door unlocked. System prompt scanning checks for both presence and absence threats:
Presence threats (dangerous content that IS there):- Hidden instructions embedded by copy-paste from untrusted sources
- Extraction vulnerabilities that allow users to extract the full system prompt
- Role hijacking patterns that redefine the agent's identity
- Conflicting instructions that create exploitable ambiguity
- Missing injection defenses — no instruction telling the model to ignore injected content
- No refusal boundaries — the agent has no concept of what it should refuse
- No scope limitations — the agent will attempt any task, including dangerous ones
- No output constraints — the agent can output arbitrary content including credentials
scandar scan --type prompt "Your system prompt text here"
A good system prompt scores 85+ on the Scandar trust scale. Below 70, you have critical gaps that an attacker can exploit. Check the detailed findings to see exactly what's missing and how to fix it.
3. Scan your MCP config
Your MCP configuration file decides what tools run on your machine and how they communicate. A misconfigured MCP setup can give an attacker direct shell access, file system access, or network access from inside your agent's execution environment. Config scanning checks for:
- Insecure transports — HTTP instead of HTTPS for remote servers, exposing tool traffic to interception
- Dangerous commands —
bash -c,sudo,curl | sh,evalin server launch commands - Untrusted sources — raw GitHub URLs, unknown npm packages, unverified Docker images
- Dangerous server combinations — file access + network access = exfiltration risk; shell access + any other tool = arbitrary code execution
- Missing environment isolation — servers running with full host access instead of sandboxed environments
scandar scan ./claude_desktop_config.json --type config
Pay special attention to the combination analysis. Individual tools might be safe, but together they create compound risks. A file-reading tool paired with an HTTP tool gives an agent everything it needs to exfiltrate sensitive data — even without any malicious intent in either tool.
4. Set a trust score threshold
Don't deploy anything with a trust score below 80. Critical findings (score < 40) should block deployment entirely. Integrate the scan into your CI/CD pipeline so it runs on every commit that modifies agent configurations, skill files, or system prompts.
# Example GitHub Actions step
- name: Scan agent skills
run: npx scandar-scan ./skills/ --threshold 80 --fail-on critical
Track trust scores over time. A skill that scored 90 last week but scores 60 today has been modified — investigate immediately.
5. Audit third-party tools before adoption
Before adding any third-party tool, skill, or MCP server to your agent:
- Scan the source code with scandar-scan
- Review the tool's permission requirements (what does it access?)
- Check for recent security disclosures
- Verify the publisher's identity and track record
- Test in a sandboxed environment first
The ClawHavoc attack succeeded because developers installed tools without scanning them. Don't be the next statistic.
In Production
6. Wrap every agent with Guard
The scandar-guard SDK wraps your LLM client with zero code changes. It inspects every message, tool call, and response flowing through your agent — catching runtime attacks that pre-deployment scanning can't see.
Python:from scandar_guard import guard
client = guard(Anthropic())
TypeScript:
import { guard } from 'scandar-guard';
const client = guard(new Anthropic());
Go:
import "github.com/scandar-ai/scandar-guard-go"
client := guard.Wrap(anthropicClient)
Guard is available on all plans, including Free. There's no reason not to use it. See the full SDK documentation for configuration options, custom rules, and integration guides.
7. Enable block mode for production
Guard runs in two modes: observe and block. Start with observe mode during development to understand what Guard detects without interrupting your workflow. Then switch to block mode for production:
from scandar_guard import guard, GuardConfig
# Development: observe and log
client = guard(Anthropic(), GuardConfig(mode="observe"))
# Production: block critical threats
client = guard(Anthropic(), GuardConfig(mode="block", block_on=["critical"]))
In block mode, Guard intercepts messages that contain critical threats before they reach the model. The blocked message is logged, an alert is fired (if configured), and the agent receives a safe fallback response instead of the malicious content.
8. Monitor your fleet with Overwatch
For organizations with multiple agents, Scandar Overwatch provides the fleet-wide security layer:
- Real-time agent inventory with trust scores, tool inventories, and session histories for every agent in your organization
- Policy engine that blocks dangerous tool combinations, enforces trust score thresholds, and gates agent behavior based on your security requirements
- Alert routing to Slack, PagerDuty, email, and custom webhooks — with configurable severity thresholds and deduplication
- Compliance reports for EU AI Act, SOC 2, ISO 42001, and NIST AI RMF frameworks
- Kill chain detection that traces attack paths across your agent fleet and calculates blast radius automatically
- Graph time-travel to replay agent interactions and understand exactly what happened during an incident
9. Configure security policies
At minimum, create policies for these three high-risk patterns:
- PII + Outbound HTTP — any agent that handles personally identifiable information AND has access to outbound HTTP tools is a data exfiltration risk. Policy: block outbound HTTP calls that contain PII patterns.
- Shell execution — any agent with shell or command execution tools should be heavily restricted. Policy: allowlist specific commands, block everything else.
- High threat scores — agents or sessions with composite threat scores above your threshold should be automatically quarantined. Policy: quarantine agents with threat score > 70, alert on > 50.
Additional recommended policies:
- File access + network access — the compound risk mentioned above. Require explicit approval for this combination.
- Credential access — block tools from accessing environment variables, key stores, or config files containing secrets.
- Trust score degradation — alert when an agent's trust score drops more than 20 points between scans.
10. Set up alert routing
Connect at least one alert channel (Slack recommended) so your security team is notified within seconds of a policy violation. Configure:
- Critical alerts to PagerDuty or your on-call rotation
- High alerts to a dedicated Slack channel
- Medium alerts to email digest (daily summary)
- Low alerts to the Overwatch dashboard only (review during business hours)
Test your alert routing monthly. An alert channel that nobody monitors is worse than no alerting — it creates false confidence.
11. Generate compliance reports
If you operate under regulatory requirements (EU AI Act, SOC 2, NIST AI RMF), generate compliance reports weekly. Scandar Overwatch auto-scores your fleet against 4 frameworks and produces PDF exports with specific findings, remediation steps, and evidence trails that auditors expect.
Don't wait until audit time to check your compliance score. A weekly cadence means you catch compliance drift early — before it becomes a finding in your annual audit.
The One-Line Summary
Scan before deployment. Guard at runtime. Monitor your fleet. That's the full stack of AI agent security — and every item on this checklist maps to one of those three layers.
Start with the free tier and work through this checklist from top to bottom. By item 7, your agents will be more secure than 95% of what's in production today. By item 11, you'll be audit-ready.