Infrastructure Layer

Guardrails & Safety

Three-stage guard pipeline with prompt injection detection, PII filtering, content moderation, and capability-based sandboxing with default-deny policies.

3-Stage PipelinePrompt InjectionPIISandboxingDefault-Deny

Overview

Production AI systems require safety guarantees that go beyond prompt engineering. Beluga AI provides a three-stage guard pipeline that inspects every interaction at the input, output, and tool execution boundaries. Each stage operates independently, so you can mix and match guards from different providers or write your own.

The guard system includes built-in detectors for common threats: prompt injection attacks (heuristic, classifier-based, and Spotlighting strategies), PII leakage (with configurable redaction), and content moderation (toxicity, hate speech, inappropriate content). Guards can block, modify, or flag content depending on your policy configuration.

For tool execution, Beluga enforces capability-based sandboxing with a default-deny policy. Every tool must declare the capabilities it requires -- such as CapToolExec, CapMemoryRead, or CapNetworkAccess -- and the runtime grants only the minimum privileges needed. This prevents unauthorized data access and limits the blast radius of any single tool failure.

Capabilities

Three-Stage Pipeline

Guards execute at three distinct boundaries: Input guards validate user messages before they reach the LLM, Output guards check model responses before they reach the user, and Tool guards verify tool calls and results. Each stage is independent and composable.

Prompt Injection Detection

Multiple detection strategies work together to catch injection attempts. Heuristic rules detect known patterns. Classifier-based detection uses trained models for nuanced attacks. Spotlighting transforms input to make injected instructions visible to the detector. Strategies are composable and can run in parallel.

PII Filtering

Detect and redact personally identifiable information before it reaches LLMs or appears in logs. Supports names, emails, phone numbers, addresses, SSNs, credit cards, and custom patterns. Redaction is configurable: mask, hash, or replace with entity tags.

Content Moderation

Filter toxic, hateful, or inappropriate content from both inputs and outputs. Configurable severity thresholds let you tune sensitivity per use case. Supports custom category definitions for domain-specific moderation rules.

Capability-Based Sandboxing

Every tool operates under a default-deny security model. Tools must declare required capabilities: CapToolExec, CapMemoryRead, CapMemoryWrite, CapNetworkAccess, CapFileRead, CapFileWrite. The runtime grants only what is explicitly permitted by policy.

Safety Provider Adapters

Integrate with dedicated safety platforms via provider adapters. Each adapter normalizes the provider's API into the Beluga guard interface, so you can swap providers without changing application code.

Architecture

User Input
Input Guards
Injection / PII / Moderation
LLM / Agent
Output Guards
PII / Moderation
Tool Guards
Sandboxing / Caps
Safe Response

Providers & Implementations

Name Priority Key Differentiator
Built-in Guards P0 Zero-dependency heuristic detectors for injection, PII, and moderation
NVIDIA NeMo Guardrails P1 Programmable guardrails with Colang dialog rules and topical control
Guardrails AI P1 Validator-based approach with a large hub of community validators
LLM Guard P2 Open-source scanner suite for prompt injection, toxicity, and bias
Lakera Guard P2 Real-time API for prompt injection with continuously updated threat models
Azure AI Content Safety P2 Microsoft-hosted moderation with severity scoring and custom categories

Full Example

Set up a three-stage guard pipeline with input, output, and tool guards:

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/lookatitude/beluga-ai/agent"
    "github.com/lookatitude/beluga-ai/guard"
    "github.com/lookatitude/beluga-ai/llm"
)

func main() {
    ctx := context.Background()

    // Create guard pipeline with three stages
    pipeline := guard.NewPipeline(
        // Stage 1: Input guards
        guard.WithInputGuards(
            guard.NewPromptInjectionDetector(
                guard.WithStrategy(guard.StrategyHeuristic),
                guard.WithStrategy(guard.StrategyClassifier),
                guard.WithStrategy(guard.StrategySpotlighting),
            ),
            guard.NewPIIFilter(
                guard.WithEntities("email", "phone", "ssn", "credit_card"),
                guard.WithRedactionMode(guard.RedactMask),
            ),
            guard.NewContentModerator(
                guard.WithCategories("toxicity", "hate_speech"),
                guard.WithThreshold(0.8),
            ),
        ),

        // Stage 2: Output guards
        guard.WithOutputGuards(
            guard.NewPIIFilter(
                guard.WithEntities("email", "phone", "ssn"),
                guard.WithRedactionMode(guard.RedactReplace),
            ),
            guard.NewContentModerator(
                guard.WithThreshold(0.7),
            ),
        ),

        // Stage 3: Tool guards with capability sandboxing
        guard.WithToolGuards(
            guard.NewCapabilitySandbox(
                guard.WithDefaultDeny(),
                guard.WithAllowedCapabilities(
                    guard.CapToolExec,
                    guard.CapMemoryRead,
                    // CapNetworkAccess deliberately omitted
                ),
            ),
        ),
    )

    // Create model and agent with guard pipeline
    model, _ := llm.New("openai", llm.WithModel("gpt-4o"))

    myAgent, _ := agent.New("safe-agent",
        agent.WithModel(model),
        agent.WithGuardPipeline(pipeline),
        agent.WithGuardHooks(guard.Hooks{
            OnBlock: func(ctx context.Context, stage string, reason string) {
                fmt.Printf("[BLOCKED] Stage: %s, Reason: %s\n", stage, reason)
            },
            OnRedact: func(ctx context.Context, stage string, count int) {
                fmt.Printf("[REDACTED] Stage: %s, Items: %d\n", stage, count)
            },
        }),
    )

    result, err := myAgent.Run(ctx, "Analyze this document for key insights")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(result)
}

Related Features