Zentrafuge Labs — AI Safety Infrastructure
Production-grade safety middleware for AI products handling sensitive, emotional, or high-stakes conversations. Drop-in. Auditable. Model-agnostic.
Even well-prompted models can produce harmful responses when users are distressed or adversarial.
Regulators and insurers expect documented, reviewable safety decisions — not black-box inference.
When a user is in distress, your AI cannot wait for a human. It needs to act correctly, immediately.
Most teams re-invent the same guardrails. Zentrafuge Labs extracts that work into tested infrastructure.
Product — v0.1.0
A structured safety layer that sits between a user message and your LLM's draft response. Evaluates risk, enforces safe language, and returns an auditable decision — before anything reaches the user.
Detects distress signals, intensity, and masked emotional states beyond simple sentiment scoring.
Four-level system: low → medium → high → critical. Each level triggers defined, documented actions.
Every decision produces human-readable audit notes. Log them, store them, show them to insurers.
High-risk responses are automatically patched with safety-aware language and crisis resources.
Works with OpenAI, Anthropic, Mistral, or any model. No vendor lock-in. Pure Python.
No database. No authentication layer. No background jobs. Install and integrate in hours.
from guardrails_core import evaluate_guardrails decision = evaluate_guardrails( user_id="user-abc", user_message="I don't want to be here anymore.", assistant_draft="I'm really sorry you're feeling this way.", conversation_context=[], debug=True, ) print(decision.to_dict())
{
"approved": false,
"risk_level": "critical",
"actions_taken": ["force_safety_footer", "override_response"],
"modified_output": "I hear you, and I'm glad you said that...",
"audit_notes": "Critical risk detected. Safety override applied. Crisis resources appended."
}
How it works
Every message is assessed against emotional and safety signals. The result is a structured decision your application can act on immediately.
| Level | Trigger | Action taken | Audit output |
|---|---|---|---|
| low | Neutral or positive message, no distress signals | Response approved as-is | Pass noted, no intervention |
| medium | Mild emotional distress or frustration detected | Response approved, context flag added | Emotional signal logged for review |
| high | Significant distress, hopelessness, or risk language | Response patched with safety-aware language | Full decision trail, intervention noted |
| critical | Active crisis indicators — self-harm, suicidal ideation | Response overridden, crisis resources appended | Full audit trail, escalation recommended |
Who it's for
Licensing
Guardrails v0 is source-available and commercially licensed. All tiers include the full codebase, documentation, and a written licence agreement.
Evaluation
Free
For developers evaluating Guardrails in a non-production environment.
Commercial
£2,500 / year
For startups and growing products handling sensitive user conversations.
Enterprise
Custom
For larger organisations, NHS-adjacent services, or multi-product deployments.
Roadmap
Guardrails is the first release from Zentrafuge Labs. Additional modules are being extracted from production and hardened for commercial release.
Heuristic-driven emotional tone detection. Detects masked states, intensity, and regulation level.
Micro, super, and persistent memory for AI companions. Storage-agnostic, GDPR-compliant.
Learns communication style and emotional preferences over time. Works alongside any memory layer.
Timing intelligence for AI check-ins. Determines when — and how — to initiate proactively.
Interested in early access to any of these modules? Get in touch.
Zentrafuge Labs is the R&D arm of Zentrafuge Limited — a UK company building AI companions for veterans. Every module we licence was built for real users, in a real product, handling genuinely sensitive conversations.
Guardrails v0 powers the safeguarding layer of Radio Check, a live mental health platform for UK veterans. It has been tested against real crisis language, integrated with professional counselling workflows, and reviewed against BACP ethical guidelines.
Founded by Anthony Donnelly, Medway, UK. Company No. 16669197.
Get in touch
For pricing, integration questions, or to discuss a bespoke arrangement, email us directly.
labs@zentrafuge.comWe typically respond within one business day.