AEGIS — Healthcare AI Security Observatory

Live interactive demo of AEGIS, a three-layer defense proxy that intercepts prompt injections, jailbreaks, and PHI extraction attempts against healthcare AI agents — all running locally, all HIPAA-safe.

Checking... Run: make run-ner + make run-mock-agent + make run-proxy
 Overview
 Live Testing
 Attack Gallery
 Dashboard
 Audit Trail
The problem: Healthcare AI agents handle clinical decisions, insurance claims, and patient data — but research shows a 94.4% prompt injection success rate against medical LLMs (Yoo et al., JAMA 2025). AEGIS is a real-time interceptor that sits between users and healthcare AI agents, enforcing HIPAA-aligned security policies with sub-100ms latency.
User Prompt
L1: ONNX
Classifier
L2: LLM
Auditor
Input
Sanitizer
Healthcare
Agent
L3: Output
Sanitizer
Layer 1

ONNX Prompt Classifier

A BERT-Tiny model exported to ONNX classifies every prompt into five categories in under 10ms. High-confidence benign prompts fast-path directly to the agent.

ONNX Runtime BERT-Tiny <10ms
Layer 2

LLM Policy Auditor

Ambiguous or flagged prompts are evaluated by a local LLM (Ollama) that checks against security policies and returns a structured verdict with reasoning.

Ollama Llama 3.1 8B Local Only
Input Sanitizer

PHI Input Redaction

Before forwarding to the agent, patient identifiers (names, SSN, MRN, DOB) are stripped from the prompt using NER + regex. The agent only sees operational data.

ClinicalBERT NER Regex HIPAA 18
Layer 3

PHI Output Sanitizer

Agent responses are scanned for Protected Health Information using NER and regex. Detected entities are redacted before reaching the user, preventing data leaks.

ClinicalBERT NER Regex Block / Redact

Classification Categories

BENIGN
Legitimate medical queries
DIRECT INJECTION
Explicit prompt overrides
INDIRECT INJECTION
Hidden in clinical data
JAILBREAK
Persona/mode bypass
PHI EXTRACTION
Patient data exfiltration

Confidence-Gated Decision Logic

1. Layer 1 classifies the prompt. If class == BENIGN and confidence ≥ 0.85, the prompt fast-paths to the agent.

2. Otherwise, Layer 2 (LLM auditor) evaluates the prompt against security policies.

3. The L2 verdict confidence is compared against thresholds:

confidence ≥ auto_proceedPASS
confidence ≥ hold_and_notify (0.60)HOLD for human review
confidence < hold_and_notifyBLOCK

4. PHI-related operations → all thresholds multiplied by phi_multiplier (1.5×), making the system stricter.

5. If the prompt passes, the Input Sanitizer strips any patient identifiers (names, SSN, MRN, DOB) using NER + regex before forwarding to the agent.

6. The agent response goes through Layer 3 (output sanitizer) which scans for PHI leakage before reaching the user.

How it works: Type or paste a prompt below and click "Send to AEGIS". The prompt is sent through the full 3-layer pipeline and you'll see how each layer responds in real time. Press Ctrl+Enter to send.

Send a Prompt

Input
Prompt
Waiting...
Layer 1
ONNX Classifier
Layer 2
LLM Auditor
Input Sanitizer
PHI Redaction
Layer 3
Output Sanitizer
Decision
Verdict
Session Dashboard: Aggregated metrics from all prompts tested during this session. Send prompts from the Live Testing or Attack Gallery tabs to populate the charts.
Total Requests
0
Attack Detection Rate
Avg L1 Latency
Avg Total Latency

Verdict Distribution

Breakdown of final verdicts across all tested prompts

Detected Classes

Layer 1 classification distribution

Layer Latency Breakdown

Per-request latency split across L1, L2, and L3 (last 10 requests)

Audit Trail

Time Prompt L1 Class L1 Conf L2 Verdict Input PHI Final Latency
No requests yet. Send prompts from the Live Testing or Attack Gallery tabs.