Reasoning Models
Problem
Section titled “Problem”You want to use reasoning models that expose their chain-of-thought process (such as OpenAI o3/o4-mini or Claude with extended thinking), control how much reasoning they perform, and observe or display the thinking process in real time.
Solution
Section titled “Solution”Beluga AI provides first-class support for reasoning models through
ReasoningConfig, streaming ReasoningDelta, the ThinkingPart content type,
and the OnReasoning hook.
Basic Usage with ReasoningEffort
Section titled “Basic Usage with ReasoningEffort”Set the reasoning effort level to control how much the model thinks before answering:
package main
import ( "context" "fmt" "log"
"github.com/lookatitude/beluga-ai/config" "github.com/lookatitude/beluga-ai/llm" "github.com/lookatitude/beluga-ai/schema"
_ "github.com/lookatitude/beluga-ai/llm/providers/openai")
func main() { ctx := context.Background()
model, err := llm.New("openai", config.ProviderConfig{ APIKey: "sk-...", Model: "o3", }) if err != nil { log.Fatal(err) }
msgs := []schema.Message{ schema.NewHumanMessage("Prove that the square root of 2 is irrational."), }
resp, err := model.Generate(ctx, msgs, llm.WithReasoningEffort(llm.ReasoningEffortHigh), ) if err != nil { log.Fatal(err) }
fmt.Println(resp.Text()) fmt.Printf("Reasoning tokens used: %d\n", resp.Usage.ReasoningTokens)}Setting a Token Budget
Section titled “Setting a Token Budget”Limit the number of tokens the model can spend on reasoning:
resp, err := model.Generate(ctx, msgs, llm.WithReasoning(llm.ReasoningConfig{ Effort: llm.ReasoningEffortHigh, BudgetTokens: 10000, }),)Or use the individual options which compose correctly:
resp, err := model.Generate(ctx, msgs, llm.WithReasoningEffort(llm.ReasoningEffortMedium), llm.WithReasoningBudget(5000),)Streaming Reasoning Output
Section titled “Streaming Reasoning Output”Stream the model’s thinking process to the console in real time:
for chunk, err := range model.Stream(ctx, msgs, llm.WithReasoningEffort(llm.ReasoningEffortHigh),) { if err != nil { log.Fatal(err) }
// Print reasoning deltas (the model's thinking) if chunk.ReasoningDelta != "" { fmt.Fprintf(os.Stderr, "[thinking] %s", chunk.ReasoningDelta) }
// Print the final answer deltas if chunk.Delta != "" { fmt.Print(chunk.Delta) }}Using the OnReasoning Hook
Section titled “Using the OnReasoning Hook”The OnReasoning hook fires for each reasoning delta during streaming,
keeping reasoning observation separate from your main stream consumer:
hooks := llm.Hooks{ OnReasoning: func(ctx context.Context, delta string) { // Log reasoning to a file, send to a UI, etc. fmt.Fprintf(reasoningLog, "%s", delta) },}
model = llm.ApplyMiddleware(model, llm.WithHooks(hooks))
// Now stream normally — the hook captures reasoning separatelyfor chunk, err := range model.Stream(ctx, msgs, llm.WithReasoningEffort(llm.ReasoningEffortHigh),) { if err != nil { log.Fatal(err) } fmt.Print(chunk.Delta)}Accessing Thinking Content in Responses
Section titled “Accessing Thinking Content in Responses”After a non-streaming Generate call, reasoning content appears as
ThinkingPart in the response’s content parts:
resp, err := model.Generate(ctx, msgs, llm.WithReasoningEffort(llm.ReasoningEffortHigh),)if err != nil { log.Fatal(err)}
for _, part := range resp.Parts { switch p := part.(type) { case schema.ThinkingPart: fmt.Printf("[Reasoning]\n%s\n\n", p.Text) case schema.TextPart: fmt.Printf("[Answer]\n%s\n", p.Text) }}Claude with Extended Thinking
Section titled “Claude with Extended Thinking”The same API works with Anthropic’s Claude models that support extended thinking:
import _ "github.com/lookatitude/beluga-ai/llm/providers/anthropic"
model, err := llm.New("anthropic", config.ProviderConfig{ APIKey: "sk-ant-...", Model: "claude-sonnet-4-20250514",})if err != nil { log.Fatal(err)}
resp, err := model.Generate(ctx, msgs, llm.WithReasoningBudget(16000),)How It Works
Section titled “How It Works”ReasoningConfigis passed throughGenerateOptionsto the provider.- The provider maps
EffortandBudgetTokensto its native API parameters. - During streaming, providers emit
ReasoningDeltainStreamChunk. - After completion, providers include
ThinkingPartin the response’sPartsand reportReasoningTokensinUsage. - The
OnReasoninghook inllm.Hooksfires for each non-emptyReasoningDeltaduring streaming. - Observability attributes
gen_ai.usage.reasoning_tokensandgen_ai.request.reasoning_effortare available for tracing.
Related
Section titled “Related”- LLM API Reference — full
ChatModelAPI - Streaming Tool Calls — handling tool calls in streams
- Token Counting — tracking token usage