Advanced Inference Options
Default LLM settings work for general queries, but specific tasks require tuning. Code generation needs low temperature for determinism. Creative writing benefits from higher temperature for variety. Beluga AI v2 provides per-call GenerateOption functions that control these parameters across all providers through a unified API. This functional options pattern (WithX()) means you specify only the parameters you want to change — unset options use provider defaults, and the same option functions work regardless of which LLM provider is active.
What You Will Build
Section titled “What You Will Build”Configurations for different use cases — deterministic code generation, creative writing, and structured JSON output — using Beluga AI’s GenerateOption system.
Prerequisites
Section titled “Prerequisites”- A configured LLM provider (OpenAI, Anthropic, or any registered provider)
- Understanding of the ChatModel interface
Generate Options
Section titled “Generate Options”Options are applied per-call to Generate or Stream. This per-call design is intentional — it allows the same model instance to serve different use cases without creating separate model objects. A single model can generate deterministic code in one call and creative prose in the next.
resp, err := model.Generate(ctx, msgs, llm.WithTemperature(0.2), llm.WithMaxTokens(500),)All options are collected into a GenerateOptions struct that providers read to configure their API requests.
Step 1: Temperature and TopP
Section titled “Step 1: Temperature and TopP”Temperature controls randomness in token selection. At temperature 0, the model always picks the most likely token (greedy decoding). Higher temperatures flatten the probability distribution, making less likely tokens more probable. TopP (nucleus sampling) takes a different approach: it restricts the token pool to the smallest set whose cumulative probability exceeds the threshold, then samples from that set.
package main
import ( "context" "fmt" "os"
"github.com/lookatitude/beluga-ai/config" "github.com/lookatitude/beluga-ai/llm" _ "github.com/lookatitude/beluga-ai/llm/providers/openai" "github.com/lookatitude/beluga-ai/schema")
func main() { ctx := context.Background()
model, err := llm.New("openai", config.ProviderConfig{ "model": "gpt-4o", "api_key": os.Getenv("OPENAI_API_KEY"), }) if err != nil { fmt.Printf("Error: %v\n", err) return }
msgs := []schema.Message{ schema.NewHumanMessage("Write a haiku about Go programming"), }
// High creativity: temperature 1.2, TopP 0.9 creative, err := model.Generate(ctx, msgs, llm.WithTemperature(1.2), llm.WithTopP(0.9), ) if err != nil { fmt.Printf("Error: %v\n", err) return } fmt.Printf("Creative: %s\n\n", creative.Text())
// Deterministic: temperature 0 deterministic, err := model.Generate(ctx, msgs, llm.WithTemperature(0.0), ) if err != nil { fmt.Printf("Error: %v\n", err) return } fmt.Printf("Deterministic: %s\n", deterministic.Text())}Temperature guidelines:
- 0.0 — Nearly deterministic. Use for code generation, data extraction, classification.
- 0.3-0.7 — Balanced. Use for Q&A, summarization, analysis.
- 0.8-1.5 — Creative. Use for writing, brainstorming, storytelling.
Step 2: Max Tokens and Stop Sequences
Section titled “Step 2: Max Tokens and Stop Sequences”Control output length and stopping conditions. MaxTokens prevents runaway generation that wastes tokens and cost. Stop sequences tell the model to halt when it produces specific text, which is useful for constraining output format — for example, stopping at a code fence closing to avoid the model appending explanatory text after a code block.
// Limit response to 100 tokensresp, err := model.Generate(ctx, msgs, llm.WithMaxTokens(100),)
// Stop generating when specific sequences appearresp, err = model.Generate(ctx, msgs, llm.WithStopSequences("```", "\n\n"),)Step 3: Structured JSON Output
Section titled “Step 3: Structured JSON Output”Force the model to produce valid JSON using ResponseFormat. JSON mode guarantees syntactically valid JSON output, eliminating the need for error-prone parsing of free-text responses. JSON Schema mode goes further by constraining the output to match a specific schema, which is how Beluga AI’s structured output feature ensures type-safe responses.
// JSON mode — model returns valid JSONresp, err := model.Generate(ctx, msgs, llm.WithResponseFormat(llm.ResponseFormat{ Type: "json_object", }),)
// JSON Schema — model output conforms to a specific schemaresp, err = model.Generate(ctx, msgs, llm.WithResponseFormat(llm.ResponseFormat{ Type: "json_schema", Schema: map[string]any{ "type": "object", "properties": map[string]any{ "sentiment": map[string]any{ "type": "string", "enum": []any{"positive", "negative", "neutral"}, }, "confidence": map[string]any{ "type": "number", }, }, "required": []any{"sentiment", "confidence"}, }, }),)Step 4: Tool Choice Control
Section titled “Step 4: Tool Choice Control”When tools are bound to a model, control how the model selects them. Tool choice is important for agents that need predictable behavior — ToolChoiceRequired forces the model to call at least one tool (useful in tool-use loops), while ToolChoiceNone prevents tool calls when you want a natural language response even though tools are available.
// Let the model decide (default)resp, err := model.Generate(ctx, msgs, llm.WithToolChoice(llm.ToolChoiceAuto))
// Prevent tool callsresp, err = model.Generate(ctx, msgs, llm.WithToolChoice(llm.ToolChoiceNone))
// Force at least one tool callresp, err = model.Generate(ctx, msgs, llm.WithToolChoice(llm.ToolChoiceRequired))
// Force a specific toolresp, err = model.Generate(ctx, msgs, llm.WithSpecificTool("calculator"))Step 5: Provider-Specific Options
Section titled “Step 5: Provider-Specific Options”Use WithMetadata for options that are specific to a particular provider. The metadata map passes through to the provider’s API request builder, allowing access to provider-specific features without requiring the core option set to account for every possible parameter across all providers.
resp, err := model.Generate(ctx, msgs, llm.WithTemperature(0.5), llm.WithMetadata(map[string]any{ "frequency_penalty": 0.5, "presence_penalty": 0.3, "seed": 42, }),)Recommended Configurations by Task
Section titled “Recommended Configurations by Task”| Task | Temperature | Max Tokens | Other |
|---|---|---|---|
| Code generation | 0.0 | 2000 | Stop on \``` |
| Data extraction | 0.0 | 500 | JSON mode |
| Q&A | 0.3 | 1000 | — |
| Summarization | 0.5 | 500 | — |
| Creative writing | 1.0 | 2000 | TopP 0.9 |
| Brainstorming | 1.2 | 1000 | Presence penalty 0.5 |
Verification
Section titled “Verification”- Generate responses with temperature 0.0 — verify they are consistent across runs.
- Generate with temperature 1.2 — verify more variation.
- Use JSON mode — verify the response is valid JSON.
Next Steps
Section titled “Next Steps”- Adding a New LLM Provider — Extend the framework with custom providers
- Multi-provider Chat — Use multiple providers together