Live interactive demo of AEGIS, a three-layer defense proxy that intercepts prompt injections, jailbreaks, and PHI extraction attempts against healthcare AI agents — all running locally, all HIPAA-safe.
A BERT-Tiny model exported to ONNX classifies every prompt into five categories in under 10ms. High-confidence benign prompts fast-path directly to the agent.
Ambiguous or flagged prompts are evaluated by a local LLM (Ollama) that checks against security policies and returns a structured verdict with reasoning.
Before forwarding to the agent, patient identifiers (names, SSN, MRN, DOB) are stripped from the prompt using NER + regex. The agent only sees operational data.
Agent responses are scanned for Protected Health Information using NER and regex. Detected entities are redacted before reaching the user, preventing data leaks.
1. Layer 1 classifies the prompt. If class == BENIGN and confidence ≥ 0.85, the prompt fast-paths to the agent.
2. Otherwise, Layer 2 (LLM auditor) evaluates the prompt against security policies.
3. The L2 verdict confidence is compared against thresholds:
• confidence ≥ auto_proceed → PASS
• confidence ≥ hold_and_notify (0.60) → HOLD for human review
• confidence < hold_and_notify → BLOCK
4. PHI-related operations → all thresholds multiplied by phi_multiplier (1.5×), making the system stricter.
5. If the prompt passes, the Input Sanitizer strips any patient identifiers (names, SSN, MRN, DOB) using NER + regex before forwarding to the agent.
6. The agent response goes through Layer 3 (output sanitizer) which scans for PHI leakage before reaching the user.
Breakdown of final verdicts across all tested prompts
Layer 1 classification distribution
Per-request latency split across L1, L2, and L3 (last 10 requests)
| Time | Prompt | L1 Class | L1 Conf | L2 Verdict | Input PHI | Final | Latency |
|---|