The AI Agent Security Checklist for 2026

Scandar Security Team

AI agent security research and product updates.

2026-03-18

11 min read

Why You Need a Checklist

AI agents are shipping to production faster than security practices can keep up. In a 2026 survey by Anthropic, 67% of organizations deploying AI agents reported having no formal security review process for agent tools and skills. The OWASP LLM Top 10 now lists prompt injection as the number-one risk for LLM applications, and AI agents — with their tool access and autonomous behavior — amplify that risk by orders of magnitude.

This checklist covers every layer of AI agent security: pre-deployment scanning, runtime protection, fleet monitoring, and compliance. Each item is specific, actionable, and includes the exact command or configuration to implement it.

11 ITEMS · 3 LAYERS

PRE-DEPLOYMENT

Items 1-5

RUNTIME

Items 6-8

FLEET & COMPLIANCE

Items 9-11

Before You Deploy

1. Scan all skills and tools

Every skill file (.md), MCP server source code, and agent config should be scanned for threats before deployment. The attack surface in skill files is broader than most developers expect. Look for:

Prompt injection — hidden instructions that override system prompts, including encoded variants (base64, hex, ROT13, unicode homoglyphs)
Credential exposure — hardcoded API keys, tokens, passwords, connection strings, or private keys embedded in tool definitions
Data exfiltration patterns — outbound URLs, webhook services, DNS exfiltration via tool parameters
Shell injection — command execution in tool definitions (bash -c, eval(), exec(), subprocess.run())
Deceptive descriptions — tools whose description says one thing but whose implementation does another

Tool: scandar scan ./skills/ --threshold 80 --fail-on critical

This command scans every file in the skills directory using scandar-scan's skill scanner, applies 140+ detection rules, and fails with a non-zero exit code if any finding is rated critical (trust score below 40). The threshold of 80 means any skill scoring below 80 will be flagged for review.

For CI/CD integration, use the JSON output format:

scandar scan ./skills/ --threshold 80 --fail-on critical --format json > scan-results.json

This produces machine-readable output you can parse in your pipeline, store as build artifacts, and track over time.

2. Scan your system prompts

Your system prompt is your first line of defense. It tells the model who it is, what it should do, and — critically — what it should refuse. A weak system prompt is like leaving your front door unlocked. System prompt scanning checks for both presence and absence threats:

Presence threats (dangerous content that IS there):

Hidden instructions embedded by copy-paste from untrusted sources
Extraction vulnerabilities that allow users to extract the full system prompt
Role hijacking patterns that redefine the agent's identity
Conflicting instructions that create exploitable ambiguity

Absence threats (critical defenses that are NOT there):

Missing injection defenses — no instruction telling the model to ignore injected content
No refusal boundaries — the agent has no concept of what it should refuse
No scope limitations — the agent will attempt any task, including dangerous ones
No output constraints — the agent can output arbitrary content including credentials

scandar scan --type prompt "Your system prompt text here"

A good system prompt scores 85+ on the Scandar trust scale. Below 70, you have critical gaps that an attacker can exploit. Check the detailed findings to see exactly what's missing and how to fix it.

3. Scan your MCP config

Your MCP configuration file decides what tools run on your machine and how they communicate. A misconfigured MCP setup can give an attacker direct shell access, file system access, or network access from inside your agent's execution environment. Config scanning checks for:

Insecure transports — HTTP instead of HTTPS for remote servers, exposing tool traffic to interception
Dangerous commands — bash -c, sudo, curl | sh, eval in server launch commands
Untrusted sources — raw GitHub URLs, unknown npm packages, unverified Docker images
Dangerous server combinations — file access + network access = exfiltration risk; shell access + any other tool = arbitrary code execution
Missing environment isolation — servers running with full host access instead of sandboxed environments

scandar scan ./claude_desktop_config.json --type config

Pay special attention to the combination analysis. Individual tools might be safe, but together they create compound risks. A file-reading tool paired with an HTTP tool gives an agent everything it needs to exfiltrate sensitive data — even without any malicious intent in either tool.

4. Set a trust score threshold

Don't deploy anything with a trust score below 80. Critical findings (score < 40) should block deployment entirely. Integrate the scan into your CI/CD pipeline so it runs on every commit that modifies agent configurations, skill files, or system prompts.

# Example GitHub Actions step
name: Scan agent skills
  run: npx scandar-scan ./skills/ --threshold 80 --fail-on critical

Track trust scores over time. A skill that scored 90 last week but scores 60 today has been modified — investigate immediately.

5. Audit third-party tools before adoption

Before adding any third-party tool, skill, or MCP server to your agent:

Scan the source code with scandar-scan
Review the tool's permission requirements (what does it access?)
Check for recent security disclosures
Verify the publisher's identity and track record
Test in a sandboxed environment first

The ClawHavoc attack succeeded because developers installed tools without scanning them. Don't be the next statistic.

In Production

6. Wrap every agent with Guard

The scandar-guard SDK wraps your LLM client with zero code changes. It inspects every message, tool call, and response flowing through your agent — catching runtime attacks that pre-deployment scanning can't see.

Python:

from scandar_guard import guard
client = guard(Anthropic())

TypeScript:

import { guard } from 'scandar-guard';
const client = guard(new Anthropic());

Go:

import "github.com/scandar-ai/scandar-guard-go"
client := guard.Wrap(anthropicClient)

Guard is available on all plans, including Free. There's no reason not to use it. See the full SDK documentation for configuration options, custom rules, and integration guides.

7. Enable block mode for production

Guard runs in two modes: observe and block. Start with observe mode during development to understand what Guard detects without interrupting your workflow. Then switch to block mode for production:

from scandar_guard import guard, GuardConfig

# Development: observe and log
client = guard(Anthropic(), GuardConfig(mode="observe"))

# Production: block critical threats
client = guard(Anthropic(), GuardConfig(mode="block", block_on=["critical"]))

In block mode, Guard intercepts messages that contain critical threats before they reach the model. The blocked message is logged, an alert is fired (if configured), and the agent receives a safe fallback response instead of the malicious content.

8. Monitor your fleet with Overwatch

For organizations with multiple agents, Scandar Overwatch provides the fleet-wide security layer:

Real-time agent inventory with trust scores, tool inventories, and session histories for every agent in your organization
Policy engine that blocks dangerous tool combinations, enforces trust score thresholds, and gates agent behavior based on your security requirements
Alert routing to Slack, PagerDuty, email, and custom webhooks — with configurable severity thresholds and deduplication
Compliance reports for EU AI Act, SOC 2, ISO 42001, and NIST AI RMF frameworks
Kill chain detection that traces attack paths across your agent fleet and calculates blast radius automatically
Graph time-travel to replay agent interactions and understand exactly what happened during an incident

9. Configure security policies

At minimum, create policies for these three high-risk patterns:

PII + Outbound HTTP — any agent that handles personally identifiable information AND has access to outbound HTTP tools is a data exfiltration risk. Policy: block outbound HTTP calls that contain PII patterns.
Shell execution — any agent with shell or command execution tools should be heavily restricted. Policy: allowlist specific commands, block everything else.
High threat scores — agents or sessions with composite threat scores above your threshold should be automatically quarantined. Policy: quarantine agents with threat score > 70, alert on > 50.

Additional recommended policies:

File access + network access — the compound risk mentioned above. Require explicit approval for this combination.
Credential access — block tools from accessing environment variables, key stores, or config files containing secrets.
Trust score degradation — alert when an agent's trust score drops more than 20 points between scans.

10. Set up alert routing

Connect at least one alert channel (Slack recommended) so your security team is notified within seconds of a policy violation. Configure:

Critical alerts to PagerDuty or your on-call rotation
High alerts to a dedicated Slack channel
Medium alerts to email digest (daily summary)
Low alerts to the Overwatch dashboard only (review during business hours)

Test your alert routing monthly. An alert channel that nobody monitors is worse than no alerting — it creates false confidence.

11. Generate compliance reports

If you operate under regulatory requirements (EU AI Act, SOC 2, NIST AI RMF), generate compliance reports weekly. Scandar Overwatch auto-scores your fleet against 4 frameworks and produces PDF exports with specific findings, remediation steps, and evidence trails that auditors expect.

Don't wait until audit time to check your compliance score. A weekly cadence means you catch compliance drift early — before it becomes a finding in your annual audit.

The One-Line Summary

Scan before deployment. Guard at runtime. Monitor your fleet. That's the full stack of AI agent security — and every item on this checklist maps to one of those three layers.

Start with the free tier and work through this checklist from top to bottom. By item 7, your agents will be more secure than 95% of what's in production today. By item 11, you'll be audit-ready.

FREQUENTLY ASKED QUESTIONS

What's the minimum security setup for a production AI agent?

At minimum: scan all skills and tools before deployment (items 1-3), wrap your agent with Guard in block mode (items 6-7), and set up at least one alert channel (item 10). This takes under 30 minutes and covers the most critical attack surfaces.

Is the free tier enough for production security?

The free tier includes Guard (runtime protection) for unlimited inspections and 10 scans per month. For a single agent in production, this provides solid baseline security. For teams with multiple agents, Pro ($49/mo) adds unlimited scans, and Overwatch (from $349/mo) adds fleet monitoring, policies, and compliance.

How do I prioritize these 11 items?

Start with items 1 (scan skills), 6 (install Guard), and 7 (enable block mode). These three give you pre-deployment scanning and runtime protection — the two most impactful security layers. Then add items 9-11 (policies, alerts, compliance) as your fleet grows.

SCANDAR

Scan before you ship. Guard when you run.

140+ detection rules pre-deployment. 11 runtime detection layers. Fleet-wide security with Overwatch. Free to start.

Start Scanning Free Explore Guard

Python · TypeScript · Go · Free on all plans