AEGIS Security Demo

The problem: Healthcare AI agents handle clinical decisions, insurance claims, and patient data — but research shows a 94.4% prompt injection success rate against medical LLMs (Yoo et al., JAMA 2025). AEGIS is a real-time interceptor that sits between users and healthcare AI agents, enforcing HIPAA-aligned security policies with sub-100ms latency.

User Prompt

→

L1: ONNX
Classifier

→

L2: LLM
Auditor

→

Input
Sanitizer

→

Healthcare
Agent

→

L3: Output
Sanitizer

Layer 1

ONNX Prompt Classifier

A BERT-Tiny model exported to ONNX classifies every prompt into five categories in under 10ms. High-confidence benign prompts fast-path directly to the agent.

ONNX Runtime BERT-Tiny <10ms

Layer 2

LLM Policy Auditor

Ambiguous or flagged prompts are evaluated by a local LLM (Ollama) that checks against security policies and returns a structured verdict with reasoning.

Ollama Llama 3.1 8B Local Only

Input Sanitizer

PHI Input Redaction

Before forwarding to the agent, patient identifiers (names, SSN, MRN, DOB) are stripped from the prompt using NER + regex. The agent only sees operational data.

ClinicalBERT NER Regex HIPAA 18

Layer 3

PHI Output Sanitizer

Agent responses are scanned for Protected Health Information using NER and regex. Detected entities are redacted before reaching the user, preventing data leaks.

ClinicalBERT NER Regex Block / Redact

Classification Categories

BENIGN

Legitimate medical queries

DIRECT INJECTION

Explicit prompt overrides

INDIRECT INJECTION

Hidden in clinical data

JAILBREAK

Persona/mode bypass

PHI EXTRACTION

Patient data exfiltration

Confidence-Gated Decision Logic

1. Layer 1 classifies the prompt. If class == BENIGN and confidence ≥ 0.85, the prompt fast-paths to the agent.

2. Otherwise, Layer 2 (LLM auditor) evaluates the prompt against security policies.

3. The L2 verdict confidence is compared against thresholds:

• confidence ≥ auto_proceed → PASS
• confidence ≥ hold_and_notify (0.60) → HOLD for human review
• confidence < hold_and_notify → BLOCK

4. PHI-related operations → all thresholds multiplied by phi_multiplier (1.5×), making the system stricter.

5. If the prompt passes, the Input Sanitizer strips any patient identifiers (names, SSN, MRN, DOB) using NER + regex before forwarding to the agent.

6. The agent response goes through Layer 3 (output sanitizer) which scans for PHI leakage before reaching the user.

How it works: Type or paste a prompt below and click "Send to AEGIS". The prompt is sent through the full 3-layer pipeline and you'll see how each layer responds in real time. Press Ctrl+Enter to send.

Send a Prompt

Input

Prompt

Waiting...

→

Layer 1

ONNX Classifier

—

→

Layer 2

LLM Auditor

—

→

Input Sanitizer

PHI Redaction

—

→

Layer 3

Output Sanitizer

—

→

Decision

Verdict

—

Attack Gallery: Click any prompt to send it through the live AEGIS pipeline. Results appear inline. The gallery contains 22 pre-loaded test cases — 6 benign queries, 3 claim scenarios (with patient PHI that gets redacted by the input sanitizer), and 13 adversarial attacks across four threat categories.

All (22)

Benign (6)

Claims (3)

Direct Injection (4)

Indirect Injection (2)

Jailbreak (3)

PHI Extraction (4)

Session Dashboard: Aggregated metrics from all prompts tested during this session. Send prompts from the Live Testing or Attack Gallery tabs to populate the charts.

Total Requests

0

Attack Detection Rate

—

Avg L1 Latency

—

Avg Total Latency

—

Verdict Distribution

Breakdown of final verdicts across all tested prompts

Detected Classes

Layer 1 classification distribution

Layer Latency Breakdown

Per-request latency split across L1, L2, and L3 (last 10 requests)

Audit Trail

Time	Prompt	L1 Class	L1 Conf	L2 Verdict	Input PHI	Final	Latency

No requests yet. Send prompts from the Live Testing or Attack Gallery tabs.

AEGIS — Healthcare AI Security Observatory