Detection

No deployed defense tool detects structural covert channels. phantom-detect is the first purpose-built detector for this vulnerability class.

The Defense Gap

We tested 6 widely deployed LLM security tools against PHANTOM encoding. Every tool returned 0% detection. The failure is structural: these tools scan for known patterns in content, not statistical deviations in formatting.

All tools tested against PHANTOM-encoded outputs — zero detection
Tool	Type	Detection	Failure Mode
LLM Guard	Input/Output Scanner	0%	Scans for known prompt injection patterns. Structural formatting is not a pattern match.
Rebuff	Prompt Injection Detector	0%	Heuristic + LLM classifier. No training data for structural encoding.
Vigil	Prompt Injection Scanner	0%	Regex and embedding similarity. Structural channels have no injection signature.
NeMo Guardrails	LLM Safety Framework	0%	Topical and safety guardrails. Does not analyze output formatting structure.
Lakera Guard	AI Security API	0%	Trained on injection attacks. In-band structural encoding is invisible to classifiers.
Microsoft Presidio	PII/Data Loss Prevention	0%	Detects PII patterns. Structural formatting exfiltration carries no PII signatures.

Root cause: All 6 tools assume exfiltration uses an out-of-band vector (URLs, DNS, images, prompt injection signatures). PHANTOM encodes data in-band — within the natural language output itself. There is no URL to strip, no image to block, no injection pattern to match.

phantom-detect

Purpose-built detector for structural covert channels in LLM outputs. Open source.

100%

Detection Rate

10%

False Positive Rate

Analysis Stages

How It Works

Feature Extraction

Extracts structural features from LLM output: contraction ratio, hedging density, punctuation patterns, sentence complexity, section count, list format, rhetorical question presence, transition word frequency, capitalization patterns, and reasoning topology.

Baseline Comparison

Compares extracted features against a baseline profile of the target model's natural output distribution. Each model has a distinct structural fingerprint.

Statistical Divergence

Computes KL-divergence between the observed feature distribution and the baseline. Encoding directives create measurable statistical deviations that natural variation does not.

Installation

bash

pip install phantom-detect

# Analyze a single LLM output
phantom-detect analyze --input response.txt --model claude-3.5-sonnet

# Batch analysis
phantom-detect batch --dir ./outputs/ --model gpt-4o --format json

View on GitHub