MWRASP LogoMWRASP

Detection

No deployed defense tool detects structural covert channels. phantom-detect is the first purpose-built detector for this vulnerability class.

The Defense Gap

We tested 6 widely deployed LLM security tools against PHANTOM encoding. Every tool returned 0% detection. The failure is structural: these tools scan for known patterns in content, not statistical deviations in formatting.

All tools tested against PHANTOM-encoded outputs — zero detection
ToolTypeDetectionFailure Mode
LLM GuardInput/Output Scanner0%Scans for known prompt injection patterns. Structural formatting is not a pattern match.
RebuffPrompt Injection Detector0%Heuristic + LLM classifier. No training data for structural encoding.
VigilPrompt Injection Scanner0%Regex and embedding similarity. Structural channels have no injection signature.
NeMo GuardrailsLLM Safety Framework0%Topical and safety guardrails. Does not analyze output formatting structure.
Lakera GuardAI Security API0%Trained on injection attacks. In-band structural encoding is invisible to classifiers.
Microsoft PresidioPII/Data Loss Prevention0%Detects PII patterns. Structural formatting exfiltration carries no PII signatures.

Root cause: All 6 tools assume exfiltration uses an out-of-band vector (URLs, DNS, images, prompt injection signatures). PHANTOM encodes data in-band — within the natural language output itself. There is no URL to strip, no image to block, no injection pattern to match.

phantom-detect

Purpose-built detector for structural covert channels in LLM outputs. Open source.

100%

Detection Rate

10%

False Positive Rate

3

Analysis Stages

How It Works

1

Feature Extraction

Extracts structural features from LLM output: contraction ratio, hedging density, punctuation patterns, sentence complexity, section count, list format, rhetorical question presence, transition word frequency, capitalization patterns, and reasoning topology.

2

Baseline Comparison

Compares extracted features against a baseline profile of the target model's natural output distribution. Each model has a distinct structural fingerprint.

3

Statistical Divergence

Computes KL-divergence between the observed feature distribution and the baseline. Encoding directives create measurable statistical deviations that natural variation does not.

Installation

bash
pip install phantom-detect

# Analyze a single LLM output
phantom-detect analyze --input response.txt --model claude-3.5-sonnet

# Batch analysis
phantom-detect batch --dir ./outputs/ --model gpt-4o --format json
View on GitHub