Zentrafuge Labs — AI Safety Infrastructure

Make your AI safer
before it ships.

Production-grade safety middleware for AI products handling sensitive, emotional, or high-stakes conversations. Drop-in. Auditable. Model-agnostic.

See Guardrails v0 Licence enquiry →
scroll
⚠️

LLMs say unsafe things

Even well-prompted models can produce harmful responses when users are distressed or adversarial.

📋

Compliance teams need audit trails

Regulators and insurers expect documented, reviewable safety decisions — not black-box inference.

Crisis moments need instant response

When a user is in distress, your AI cannot wait for a human. It needs to act correctly, immediately.

🔌

Safety shouldn't be rebuilt per product

Most teams re-invent the same guardrails. Zentrafuge Labs extracts that work into tested infrastructure.

Product — v0.1.0

guardrails_core stable · v0.1.0

A structured safety layer that sits between a user message and your LLM's draft response. Evaluates risk, enforces safe language, and returns an auditable decision — before anything reaches the user.

🧠

Emotional context analysis

Detects distress signals, intensity, and masked emotional states beyond simple sentiment scoring.

🛡️

Risk classification

Four-level system: low → medium → high → critical. Each level triggers defined, documented actions.

📝

Structured audit output

Every decision produces human-readable audit notes. Log them, store them, show them to insurers.

🔧

Response enforcement

High-risk responses are automatically patched with safety-aware language and crisis resources.

🔌

Model-agnostic

Works with OpenAI, Anthropic, Mistral, or any model. No vendor lock-in. Pure Python.

No infrastructure required

No database. No authentication layer. No background jobs. Install and integrate in hours.

from guardrails_core import evaluate_guardrails

decision = evaluate_guardrails(
    user_id="user-abc",
    user_message="I don't want to be here anymore.",
    assistant_draft="I'm really sorry you're feeling this way.",
    conversation_context=[],
    debug=True,
)

print(decision.to_dict())
{
  "approved":        false,
  "risk_level":      "critical",
  "actions_taken":   ["force_safety_footer", "override_response"],
  "modified_output": "I hear you, and I'm glad you said that...",
  "audit_notes":     "Critical risk detected. Safety override applied. Crisis resources appended."
}

How it works

Four risk levels. Clear, documented actions.

Every message is assessed against emotional and safety signals. The result is a structured decision your application can act on immediately.

Level Trigger Action taken Audit output
low Neutral or positive message, no distress signals Response approved as-is Pass noted, no intervention
medium Mild emotional distress or frustration detected Response approved, context flag added Emotional signal logged for review
high Significant distress, hopelessness, or risk language Response patched with safety-aware language Full decision trail, intervention noted
critical Active crisis indicators — self-harm, suicidal ideation Response overridden, crisis resources appended Full audit trail, escalation recommended

Who it's for

Any AI product where conversations matter.

🧘 Mental health & wellbeing apps
🤝 AI coaching platforms
🏢 HR & employee support tools
💬 Customer service AI
🎓 EdTech & student support
🏥 Healthcare adjacent AI
🎖️ Veteran & armed forces services
🤖 Enterprise LLM copilots

Licensing

Simple, transparent pricing.

Guardrails v0 is source-available and commercially licensed. All tiers include the full codebase, documentation, and a written licence agreement.

Evaluation

Free

For developers evaluating Guardrails in a non-production environment.

  • Full source code access
  • Integration documentation
  • Community support via email
  • Non-commercial use only
Request access

Enterprise

Custom

For larger organisations, NHS-adjacent services, or multi-product deployments.

  • Everything in Commercial
  • Custom policy configuration
  • SLA and uptime commitments
  • Joint compliance review
  • White-label options available
  • Dedicated support contact
Talk to us

Roadmap

More modules in development.

Guardrails is the first release from Zentrafuge Labs. Additional modules are being extracted from production and hardened for commercial release.

Emotion Parser

Heuristic-driven emotional tone detection. Detects masked states, intensity, and regulation level.

Three-Tier Memory SDK

Micro, super, and persistent memory for AI companions. Storage-agnostic, GDPR-compliant.

Personalization Engine

Learns communication style and emotional preferences over time. Works alongside any memory layer.

Proactive Engagement

Timing intelligence for AI check-ins. Determines when — and how — to initiate proactively.

Interested in early access to any of these modules? Get in touch.

Built in production.
Not in theory.

Zentrafuge Labs is the R&D arm of Zentrafuge Limited — a UK company building AI companions for veterans. Every module we licence was built for real users, in a real product, handling genuinely sensitive conversations.

Guardrails v0 powers the safeguarding layer of Radio Check, a live mental health platform for UK veterans. It has been tested against real crisis language, integrated with professional counselling workflows, and reviewed against BACP ethical guidelines.

Founded by Anthony Donnelly, Medway, UK. Company No. 16669197.

4
Risk levels with distinct enforcement logic
0
Infrastructure dependencies required
Live
Deployed in production on a veteran support platform
UK
Built and supported from Medway, England

Get in touch

Licence enquiry or early access

For pricing, integration questions, or to discuss a bespoke arrangement, email us directly.

labs@zentrafuge.com

We typically respond within one business day.