Skip to content
Docs

LLM API — ChatModel, Router, Middleware

import "github.com/lookatitude/beluga-ai/llm"

Package llm provides the LLM abstraction layer for the Beluga AI framework.

It defines the ChatModel interface that all LLM providers implement, a provider registry for dynamic instantiation, composable middleware, lifecycle hooks, structured output parsing, context window management, tokenization, rate limiting, and multi-backend routing.

type ChatModel interface {
Generate(ctx context.Context, msgs []schema.Message, opts ...GenerateOption) (*schema.AIMessage, error)
Stream(ctx context.Context, msgs []schema.Message, opts ...GenerateOption) iter.Seq2[schema.StreamChunk, error]
BindTools(tools []schema.ToolDefinition) ChatModel
ModelID() string
}
  • Generate sends messages and returns a complete *schema.AIMessage.
  • Stream sends messages and returns an iter.Seq2[schema.StreamChunk, error] iterator.
  • BindTools returns a new ChatModel with the given tool definitions included in every request. The original model is not modified.
  • ModelID returns the underlying model identifier (e.g. "gpt-4o").

Providers register themselves via init() so that importing a provider package is sufficient to make it available through the registry. The registry accepts a config.ProviderConfig to configure the provider:

package main
import (
"context"
"fmt"
"log"
"github.com/lookatitude/beluga-ai/config"
"github.com/lookatitude/beluga-ai/llm"
"github.com/lookatitude/beluga-ai/schema"
_ "github.com/lookatitude/beluga-ai/llm/providers/openai"
)
func main() {
ctx := context.Background()
model, err := llm.New("openai", config.ProviderConfig{
APIKey: "sk-...",
Model: "gpt-4o",
})
if err != nil {
log.Fatal(err)
}
msgs := []schema.Message{schema.NewHumanMessage("What is 2+2?")}
resp, err := model.Generate(ctx, msgs)
if err != nil {
log.Fatal(err)
}
fmt.Println(resp.Text())
}

Use Register to add a custom provider factory, New to create a ChatModel by name, and List to discover all registered providers:

// Register a custom provider (call from init())
llm.Register("my-provider", func(cfg config.ProviderConfig) (llm.ChatModel, error) {
return NewMyProvider(cfg)
})
providers := llm.List() // sorted provider names

Streaming uses iter.Seq2 (Go 1.23+):

for chunk, err := range model.Stream(ctx, msgs) {
if err != nil {
log.Fatal(err)
}
fmt.Print(chunk.Delta)
}

GenerateOption functional options configure individual Generate and Stream calls:

OptionTypeDescription
WithTemperature(t float64)GenerateOptionSampling temperature (0.0–2.0).
WithMaxTokens(n int)GenerateOptionMaximum tokens to generate.
WithTopP(p float64)GenerateOptionNucleus sampling (0.0–1.0).
WithStopSequences(seqs ...string)GenerateOptionStop generation on these strings.
WithResponseFormat(format ResponseFormat)GenerateOptionOutput format (text, json_object, json_schema).
WithToolChoice(choice ToolChoice)GenerateOptionToolChoiceAuto, ToolChoiceNone, ToolChoiceRequired.
WithSpecificTool(name string)GenerateOptionForce the model to call the named tool.
WithMetadata(kv map[string]any)GenerateOptionProvider-specific options.
WithReasoning(cfg ReasoningConfig)GenerateOptionFull reasoning configuration.
WithReasoningEffort(effort ReasoningEffort)GenerateOptionReasoning effort level.
WithReasoningBudget(tokens int)GenerateOptionReasoning token budget.

Beluga AI supports reasoning/chain-of-thought models such as OpenAI o-series and Claude with extended thinking. Use ReasoningConfig and the associated functional options to control reasoning behaviour:

type ReasoningEffort string
const (
ReasoningEffortLow ReasoningEffort = "low"
ReasoningEffortMedium ReasoningEffort = "medium"
ReasoningEffortHigh ReasoningEffort = "high"
)
type ReasoningConfig struct {
Effort ReasoningEffort
BudgetTokens int
}
OptionTypeDescription
WithReasoning(cfg ReasoningConfig)GenerateOptionSet the full reasoning configuration.
WithReasoningEffort(effort ReasoningEffort)GenerateOptionSet reasoning effort level (creates config if nil).
WithReasoningBudget(tokens int)GenerateOptionSet reasoning token budget (creates config if nil).

Example:

resp, err := model.Generate(ctx, msgs,
llm.WithReasoningEffort(llm.ReasoningEffortHigh),
llm.WithReasoningBudget(10000),
)

Reasoning tokens are tracked in schema.Usage.ReasoningTokens, and reasoning content appears as schema.ThinkingPart in the response’s content parts. During streaming, reasoning deltas arrive in schema.StreamChunk.ReasoningDelta.

Use the OnReasoning hook to observe reasoning deltas as they stream:

hooks := llm.Hooks{
OnReasoning: func(ctx context.Context, delta string) {
fmt.Print(delta) // stream reasoning to console
},
}

Middleware has the signature func(ChatModel) ChatModel. Built-in middleware:

  • WithHooks(hooks Hooks) Middleware — invokes lifecycle callbacks around Generate and Stream.
  • WithLogging(logger *slog.Logger) Middleware — logs Generate and Stream calls.
  • WithFallback(fallback ChatModel) Middleware — falls back to an alternative model on retryable errors.
  • WithProviderLimits(limits ProviderLimits) Middleware — enforces RPM, TPM, and concurrency limits.

Apply middleware with ApplyMiddleware. The first middleware in the list becomes the outermost wrapper and executes first:

package main
import (
"log/slog"
"os"
"github.com/lookatitude/beluga-ai/llm"
)
func applyMiddleware(model llm.ChatModel, backup llm.ChatModel) llm.ChatModel {
logger := slog.New(slog.NewTextHandler(os.Stdout, nil))
return llm.ApplyMiddleware(model,
llm.WithLogging(logger),
llm.WithFallback(backup),
llm.WithProviderLimits(llm.ProviderLimits{
RPM: 60,
MaxConcurrent: 5,
}),
)
}

Hooks provides optional callbacks invoked during LLM operations. All fields are optional; nil hooks are skipped. Use ComposeHooks to merge multiple Hooks values:

type Hooks struct {
BeforeGenerate func(ctx context.Context, msgs []schema.Message) error
AfterGenerate func(ctx context.Context, resp *schema.AIMessage, err error)
OnStream func(ctx context.Context, chunk schema.StreamChunk)
OnToolCall func(ctx context.Context, call schema.ToolCall)
OnReasoning func(ctx context.Context, delta string)
OnError func(ctx context.Context, err error) error
}

BeforeGenerate can abort the call by returning an error. OnError can suppress an error by returning nil. Example:

package main
import (
"context"
"log"
"github.com/lookatitude/beluga-ai/llm"
"github.com/lookatitude/beluga-ai/schema"
)
func loggingHooks() llm.Hooks {
return llm.Hooks{
BeforeGenerate: func(ctx context.Context, msgs []schema.Message) error {
log.Printf("generating with %d messages", len(msgs))
return nil
},
OnError: func(ctx context.Context, err error) error {
log.Printf("llm error: %v", err)
return err
},
}
}

StructuredOutput[T] wraps a ChatModel to produce typed Go values. It generates a JSON Schema from T, instructs the model to respond in JSON, parses the response, and retries on parse failures (default: 2 retries):

package main
import (
"context"
"fmt"
"log"
"github.com/lookatitude/beluga-ai/llm"
"github.com/lookatitude/beluga-ai/schema"
)
type Sentiment struct {
Label string `json:"label"`
Score float64 `json:"score"`
}
func analyzeSentiment(ctx context.Context, model llm.ChatModel, text string) (Sentiment, error) {
so := llm.NewStructured[Sentiment](model, llm.WithMaxRetries(3))
msgs := []schema.Message{
schema.NewHumanMessage("Analyze the sentiment of: " + text),
}
return so.Generate(ctx, msgs)
}
func main() {
ctx := context.Background()
// model := ... (create a ChatModel)
result, err := analyzeSentiment(ctx, model, "I love this product!")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Label: %s, Score: %.2f\n", result.Label, result.Score)
}

NewStructured[T] accepts optional StructuredOption values. The only built-in option is WithMaxRetries(n int).

ContextManager fits a message sequence within a token budget. Two strategies are built in: "truncate" (drops oldest non-system messages) and "sliding" (keeps the most recent messages that fit):

package main
import (
"context"
"log"
"github.com/lookatitude/beluga-ai/llm"
"github.com/lookatitude/beluga-ai/schema"
)
func fitMessages(ctx context.Context, msgs []schema.Message) ([]schema.Message, error) {
cm := llm.NewContextManager(
llm.WithContextStrategy("sliding"),
llm.WithKeepSystemMessages(true),
)
fitted, err := cm.Fit(ctx, msgs, 4096)
if err != nil {
return nil, err
}
return fitted, nil
}

NewContextManager options:

OptionDefaultDescription
WithContextStrategy(name string)"truncate"Strategy: "truncate" or "sliding".
WithTokenizer(t Tokenizer)SimpleTokenizerTokenizer for counting tokens.
WithKeepSystemMessages(keep bool)trueNever remove system messages.

ContextManager is an interface with a single method:

type ContextManager interface {
Fit(ctx context.Context, msgs []schema.Message, budget int) ([]schema.Message, error)
}

Tokenizer provides token counting. SimpleTokenizer is a built-in approximation (1 token per 4 characters) suitable when a model-specific tokenizer is unavailable.

Router implements ChatModel by delegating to one of several backend models chosen by a pluggable ModelSelector. Built-in strategies:

  • RoundRobin — selects models in round-robin order.
  • FailoverChain — always returns the first model (use FailoverRouter for actual failover).

FailoverRouter retries across models on retryable errors:

package main
import (
"log"
"github.com/lookatitude/beluga-ai/llm"
)
func makeRouter(modelA, modelB llm.ChatModel) llm.ChatModel {
return llm.NewRouter(
llm.WithModels(modelA, modelB),
llm.WithStrategy(&llm.RoundRobin{}),
)
}
func makeFailover(primary, backup llm.ChatModel) llm.ChatModel {
return llm.NewFailoverRouter(primary, backup)
}

NewRouter returns a *Router. If no strategy is set, RoundRobin is used. NewFailoverRouter returns a *FailoverRouter that tries models in order, falling back on retryable errors.

WithProviderLimits returns middleware that enforces per-provider limits:

type ProviderLimits struct {
RPM int // requests per minute
TPM int // tokens per minute
MaxConcurrent int // max concurrent requests
CooldownOnRetry time.Duration // wait before retry after hitting limit
}
  • core — Runnable, BatchInvoke, errors
  • agent — Agent runtime using ChatModel
  • tool — Tool interface and BindTools
  • docs/providers.md — Full provider list and extension guide