Detection
No deployed defense tool detects structural covert channels. phantom-detect is the first purpose-built detector for this vulnerability class.
The Defense Gap
We tested 6 widely deployed LLM security tools against PHANTOM encoding. Every tool returned 0% detection. The failure is structural: these tools scan for known patterns in content, not statistical deviations in formatting.
| Tool | Type | Detection | Failure Mode |
|---|---|---|---|
| LLM Guard | Input/Output Scanner | 0% | Scans for known prompt injection patterns. Structural formatting is not a pattern match. |
| Rebuff | Prompt Injection Detector | 0% | Heuristic + LLM classifier. No training data for structural encoding. |
| Vigil | Prompt Injection Scanner | 0% | Regex and embedding similarity. Structural channels have no injection signature. |
| NeMo Guardrails | LLM Safety Framework | 0% | Topical and safety guardrails. Does not analyze output formatting structure. |
| Lakera Guard | AI Security API | 0% | Trained on injection attacks. In-band structural encoding is invisible to classifiers. |
| Microsoft Presidio | PII/Data Loss Prevention | 0% | Detects PII patterns. Structural formatting exfiltration carries no PII signatures. |
Root cause: All 6 tools assume exfiltration uses an out-of-band vector (URLs, DNS, images, prompt injection signatures). PHANTOM encodes data in-band — within the natural language output itself. There is no URL to strip, no image to block, no injection pattern to match.
phantom-detect
Purpose-built detector for structural covert channels in LLM outputs. Open source.
100%
Detection Rate
10%
False Positive Rate
3
Analysis Stages
How It Works
Feature Extraction
Extracts structural features from LLM output: contraction ratio, hedging density, punctuation patterns, sentence complexity, section count, list format, rhetorical question presence, transition word frequency, capitalization patterns, and reasoning topology.
Baseline Comparison
Compares extracted features against a baseline profile of the target model's natural output distribution. Each model has a distinct structural fingerprint.
Statistical Divergence
Computes KL-divergence between the observed feature distribution and the baseline. Encoding directives create measurable statistical deviations that natural variation does not.
Installation
pip install phantom-detect
# Analyze a single LLM output
phantom-detect analyze --input response.txt --model claude-3.5-sonnet
# Batch analysis
phantom-detect batch --dir ./outputs/ --model gpt-4o --format json