Prompt Management & Engineering
Prompts are the programming interface to language models. The difference between a mediocre AI feature and a reliable one often comes down to prompt design — how you structure instructions, provide examples, and order content for cache efficiency. Beluga AI’s prompt package provides three tools for production prompt management: Template for parameterized prompt rendering, PromptManager for versioned template storage and retrieval, and Builder for assembling message sequences in cache-optimal order.
What You’ll Learn
Section titled “What You’ll Learn”This guide covers:
- Designing effective prompts and system messages
- Using the
PromptManagerand template system - Building cache-optimized message sequences with
Builder - Creating reusable persona libraries
- Few-shot learning patterns
- Prompt versioning and A/B testing
When Prompt Engineering Matters
Section titled “When Prompt Engineering Matters”For simple, one-off interactions, inline string prompts work fine. Invest in the prompt management system when:
- Consistency is critical — multiple agents or endpoints share the same prompt logic, and a change must propagate everywhere
- Non-technical teams need to update AI behavior without modifying Go code
- Testing variations — you want to A/B test prompt versions with metrics tracking
- Minimizing costs — prompt caching can reduce token costs significantly, but requires careful content ordering
- Audit and compliance — you need to track which prompt version produced each response
Prerequisites
Section titled “Prerequisites”Before starting this guide:
- Complete Working with LLMs
- Understand Go templates (
text/template) - Familiarity with JSON and YAML
The PromptManager
Section titled “The PromptManager”The PromptManager interface separates prompt content from application code. Templates are stored in an external source (filesystem, database, or custom backend), loaded by name and version, and rendered with variables at runtime. This separation means prompt authors can iterate on wording without redeploying the application, and prompt changes are tracked independently of code changes.
The prompt/providers/file package implements PromptManager for filesystem-backed templates, making it easy to version prompts alongside your code in git.
package main
import ( "context" "fmt" "log"
"github.com/lookatitude/beluga-ai/prompt" "github.com/lookatitude/beluga-ai/prompt/providers/file")
func main() { ctx := context.Background()
// Load templates from a directory manager, err := file.New("./prompts") if err != nil { log.Fatal(err) }
// Render a template by name with variables messages, err := manager.Render("greeting", map[string]any{ "Name": "Alice", "Service": "Beluga AI", }) if err != nil { log.Fatal(err) }
// messages is []schema.Message, ready to send to a ChatModel fmt.Println(messages)}Template Syntax
Section titled “Template Syntax”Templates use Go’s text/template syntax, which provides conditionals, loops, and variable substitution. This is a deliberate choice over custom template languages — Go developers already know the syntax, and the standard library template engine is well-tested and performant.
// Version: 1.0
// System message template with conditionals and loopssystemTemplate := `You are a customer support agent for {{.Company}}.
Guidelines:- Be polite and empathetic- Provide accurate information- If unsure, escalate to human support- Keep responses under 3 paragraphs
{{if .Policies}}Company policies:{{range .Policies}}- {{.}}{{end}}{{end}}`Building Cache-Optimized Prompts with Builder
Section titled “Building Cache-Optimized Prompts with Builder”LLM providers cache prompt prefixes to avoid reprocessing repeated content. The prompt.Builder enforces a message ordering that maximizes cache hit rates by placing the most static content first and the most dynamic content last. This ordering can significantly reduce both latency and cost for high-volume applications.
The builder organizes content into six ordered slots:
- System prompt — most static, rarely changes across requests
- Tool definitions — semi-static, change only when tools are added or removed
- Static context — reference documents, retrieved knowledge, deployment-specific content
- Cache breakpoint — explicit marker where the cached prefix ends
- Dynamic context — conversation history, session-specific messages
- User input — always changes, placed last
This ordering is not arbitrary. Each slot is ordered by decreasing stability: the system prompt is identical across all requests, tool definitions change only on code deployment, static context changes per deployment or retrieval cycle, and user input is unique per request. By placing stable content first, the LLM provider can cache and reuse the maximum prefix length.
import ( "github.com/lookatitude/beluga-ai/prompt" "github.com/lookatitude/beluga-ai/schema")
msgs := prompt.NewBuilder( prompt.WithSystemPrompt("You are an expert Go developer."), prompt.WithToolDefinitions(tools), prompt.WithStaticContext([]string{ "Go style guide: use gofmt, prefer short variable names...", "Project conventions: all errors must be wrapped with fmt.Errorf...", }), prompt.WithCacheBreakpoint(), prompt.WithDynamicContext(conversationHistory), prompt.WithUserInput(schema.NewHumanMessage("Review this function for bugs.")),).Build()
// msgs is []schema.Message in cache-optimal orderresp, err := model.Generate(ctx, msgs)Creating Reusable Personas
Section titled “Creating Reusable Personas”When multiple agents share similar behavioral patterns, define personas as reusable templates. This approach centralizes behavior definitions so changes propagate consistently. The Persona struct in the agent package uses a Role-Goal-Backstory framework, but for prompt-level personas you can use templates with variables for customization points.
The example below defines a persona library as a map of templates. Each persona specifies not just the system prompt text but also recommended generation parameters (temperature, max tokens) that match the persona’s purpose — a code reviewer needs low temperature for precision, while a creative writer needs high temperature for variety.
type Persona struct { Name string Description string SystemPrompt string Temperature float64 MaxTokens int}
// Define persona libraryvar PersonaLibrary = map[string]Persona{ "code_reviewer": { Name: "Code Reviewer", Description: "Senior engineer reviewing code for best practices", SystemPrompt: `You are a senior software engineer reviewing code.
Focus areas:- Code quality and readability- Security vulnerabilities- Performance issues- Best practices for {{.Language}}
Output format:1. Summary (2-3 sentences)2. Issues found (list)3. Recommendations (list)
Be constructive and specific.`, Temperature: 0.3, MaxTokens: 1000, },
"creative_writer": { Name: "Creative Writer", Description: "Novelist creating engaging narratives", SystemPrompt: `You are a creative writer specializing in {{.Genre}}.
Style guidelines:- Use vivid, descriptive language- Show, don't tell- Develop complex characters- Create engaging dialogue- Maintain consistent tone
Target audience: {{.Audience}}`, Temperature: 0.9, MaxTokens: 2000, },
"data_analyst": { Name: "Data Analyst", Description: "Analyst extracting insights from data", SystemPrompt: `You are a data analyst specializing in {{.Domain}}.
Analysis approach:- State clear hypotheses- Use statistical reasoning- Identify patterns and trends- Provide actionable insights- Visualize findings when helpful
Always cite specific data points.`, Temperature: 0.2, MaxTokens: 1500, },}
// Render persona template with variablesfunc GetPersonaMessage(personaName string, variables map[string]any) (schema.Message, error) { persona, ok := PersonaLibrary[personaName] if !ok { return nil, fmt.Errorf("unknown persona: %s", personaName) }
tmpl, err := template.New(persona.Name).Parse(persona.SystemPrompt) if err != nil { return nil, err }
var buf bytes.Buffer if err := tmpl.Execute(&buf, variables); err != nil { return nil, err }
return schema.NewSystemMessage(buf.String()), nil}Few-Shot Learning
Section titled “Few-Shot Learning”Few-shot prompts teach the model by example rather than by instruction. This is effective when the desired behavior is hard to describe in words but easy to demonstrate through input-output pairs. The model learns the pattern from the examples and applies it to the new input. For classification tasks, 3-5 examples typically provide sufficient signal. For more complex generation tasks, you may need more examples or a combination of instructions and examples.
In Beluga’s message-based API, few-shot examples are naturally represented as alternating HumanMessage and AIMessage pairs in the conversation history, placed between the system prompt and the actual user input.
type FewShotExample struct { Input string Output string}
func CreateFewShotPrompt(task string, examples []FewShotExample, newInput string) string { var prompt strings.Builder
// Task description prompt.WriteString(fmt.Sprintf("Task: %s\n\n", task))
// Examples prompt.WriteString("Examples:\n") for i, ex := range examples { prompt.WriteString(fmt.Sprintf("\nExample %d:\n", i+1)) prompt.WriteString(fmt.Sprintf("Input: %s\n", ex.Input)) prompt.WriteString(fmt.Sprintf("Output: %s\n", ex.Output)) }
// New input prompt.WriteString(fmt.Sprintf("\nNow complete this:\nInput: %s\nOutput:", newInput))
return prompt.String()}
// Example: Sentiment classification with few-shot examplesfunc ClassifySentimentFewShot(ctx context.Context, model llm.ChatModel, text string) (string, error) { examples := []FewShotExample{ { Input: "This product is amazing! Best purchase ever.", Output: "Positive", }, { Input: "Terrible quality. Waste of money.", Output: "Negative", }, { Input: "It's okay, nothing special.", Output: "Neutral", }, }
prompt := CreateFewShotPrompt( "Classify the sentiment of the following text as Positive, Negative, or Neutral.", examples, text, )
messages := []schema.Message{ schema.NewHumanMessage(prompt), }
resp, err := model.Generate(ctx, messages, llm.WithTemperature(0.0), llm.WithMaxTokens(10), ) if err != nil { return "", err }
return strings.TrimSpace(resp.Text()), nil}Dynamic Prompt Building
Section titled “Dynamic Prompt Building”When prompts need to be assembled from multiple sources at runtime — system instructions, constraints, context documents, few-shot examples, and user input — a builder pattern keeps the assembly logic organized and composable. Each With method adds content to a specific section, and Build assembles them into a properly ordered message sequence. This pattern is complementary to prompt.Builder: use prompt.Builder when you need cache-optimal slot ordering, and a custom builder when you need flexible content assembly.
type PromptBuilder struct { systemInstructions []string fewShotExamples []FewShotExample contextDocs []string constraints []string outputFormat string}
func NewPromptBuilder() *PromptBuilder { return &PromptBuilder{}}
func (pb *PromptBuilder) WithSystemInstruction(instruction string) *PromptBuilder { pb.systemInstructions = append(pb.systemInstructions, instruction) return pb}
func (pb *PromptBuilder) WithFewShotExample(input, output string) *PromptBuilder { pb.fewShotExamples = append(pb.fewShotExamples, FewShotExample{ Input: input, Output: output, }) return pb}
func (pb *PromptBuilder) WithContext(doc string) *PromptBuilder { pb.contextDocs = append(pb.contextDocs, doc) return pb}
func (pb *PromptBuilder) WithConstraint(constraint string) *PromptBuilder { pb.constraints = append(pb.constraints, constraint) return pb}
func (pb *PromptBuilder) WithOutputFormat(format string) *PromptBuilder { pb.outputFormat = format return pb}
func (pb *PromptBuilder) Build(userQuery string) []schema.Message { var systemPrompt strings.Builder
// System instructions if len(pb.systemInstructions) > 0 { systemPrompt.WriteString("Instructions:\n") for _, inst := range pb.systemInstructions { systemPrompt.WriteString(fmt.Sprintf("- %s\n", inst)) } systemPrompt.WriteString("\n") }
// Constraints if len(pb.constraints) > 0 { systemPrompt.WriteString("Constraints:\n") for _, constraint := range pb.constraints { systemPrompt.WriteString(fmt.Sprintf("- %s\n", constraint)) } systemPrompt.WriteString("\n") }
// Output format if pb.outputFormat != "" { systemPrompt.WriteString(fmt.Sprintf("Output format: %s\n\n", pb.outputFormat)) }
// Build messages messages := []schema.Message{ schema.NewSystemMessage(systemPrompt.String()), }
// Add few-shot examples as conversation history for _, ex := range pb.fewShotExamples { messages = append(messages, schema.NewHumanMessage(ex.Input), schema.NewAIMessage(ex.Output), ) }
// Add context documents if len(pb.contextDocs) > 0 { var contextMsg strings.Builder contextMsg.WriteString("Relevant context:\n\n") for i, doc := range pb.contextDocs { contextMsg.WriteString(fmt.Sprintf("[Document %d]\n%s\n\n", i+1, doc)) } messages = append(messages, schema.NewSystemMessage(contextMsg.String())) }
// Add user query messages = append(messages, schema.NewHumanMessage(userQuery))
return messages}
// Usagefunc main() { builder := NewPromptBuilder(). WithSystemInstruction("You are a helpful SQL assistant"). WithSystemInstruction("Generate valid PostgreSQL syntax only"). WithConstraint("Do not use deprecated functions"). WithConstraint("Include comments explaining complex logic"). WithOutputFormat("SQL code block with explanation"). WithFewShotExample( "Get all users who signed up last month", "SELECT * FROM users WHERE created_at >= DATE_TRUNC('month', NOW() - INTERVAL '1 month')", ). WithContext("Schema: users (id, name, email, created_at)")
messages := builder.Build("Find users who haven't logged in for 30 days") // ... send to LLM}Prompt Versioning
Section titled “Prompt Versioning”Versioning prompts is essential for reproducibility and safe iteration. When a prompt change degrades output quality, you need to know which version caused the regression and roll back to the previous one. The pattern below shows a thread-safe prompt registry that stores multiple versions of each prompt and supports “latest” lookups. In production, back this with a database or the filesystem-based PromptManager rather than an in-memory map.
type VersionedPrompt struct { ID string Version string Name string Template string CreatedAt time.Time Performance *PromptMetrics}
type PromptMetrics struct { AvgLatency time.Duration SuccessRate float64 AvgTokens int UserSatisfaction float64}
type PromptRegistry struct { prompts map[string]map[string]VersionedPrompt // name -> version -> prompt mu sync.RWMutex}
func NewPromptRegistry() *PromptRegistry { return &PromptRegistry{ prompts: make(map[string]map[string]VersionedPrompt), }}
func (pr *PromptRegistry) Register(prompt VersionedPrompt) { pr.mu.Lock() defer pr.mu.Unlock()
if _, ok := pr.prompts[prompt.Name]; !ok { pr.prompts[prompt.Name] = make(map[string]VersionedPrompt) }
pr.prompts[prompt.Name][prompt.Version] = prompt}
func (pr *PromptRegistry) Get(name, version string) (VersionedPrompt, error) { pr.mu.RLock() defer pr.mu.RUnlock()
versions, ok := pr.prompts[name] if !ok { return VersionedPrompt{}, fmt.Errorf("prompt not found: %s", name) }
// If version is "latest", get the newest if version == "latest" { var latest VersionedPrompt var latestTime time.Time
for _, p := range versions { if p.CreatedAt.After(latestTime) { latest = p latestTime = p.CreatedAt } }
if latestTime.IsZero() { return VersionedPrompt{}, fmt.Errorf("no versions found for: %s", name) }
return latest, nil }
prompt, ok := versions[version] if !ok { return VersionedPrompt{}, fmt.Errorf("version not found: %s@%s", name, version) }
return prompt, nil}
func (pr *PromptRegistry) ListVersions(name string) []string { pr.mu.RLock() defer pr.mu.RUnlock()
versions, ok := pr.prompts[name] if !ok { return nil }
result := make([]string, 0, len(versions)) for v := range versions { result = append(result, v) }
sort.Strings(result) return result}Prompts can be loaded from YAML files, making them easy to version in git alongside your code:
- id: code-review-v1 version: "1.0" name: code_reviewer template: | You are a code reviewer. Review the following code for bugs and best practices. Code: {{.Code}} created_at: 2025-01-01T00:00:00Z
- id: code-review-v2 version: "2.0" name: code_reviewer template: | You are an expert code reviewer. Analyze this code for: - Security vulnerabilities - Performance issues - Code style violations - Best practices
Code: {{.Code}}
Provide specific line numbers and recommendations. created_at: 2025-02-01T00:00:00ZA/B Testing Prompts
Section titled “A/B Testing Prompts”Prompt A/B testing lets you compare two versions quantitatively before committing to a rollout. The pattern below randomly assigns each request to variant A or B, tracks metrics for each variant, and provides a simple scoring function to determine the winner. In production, integrate this with your observability system to collect metrics automatically via LLM hooks.
type ABTest struct { Name string VariantA VersionedPrompt VariantB VersionedPrompt SplitRatio float64 // 0.5 = 50/50 split MetricsA *PromptMetrics MetricsB *PromptMetrics}
func (test *ABTest) GetVariant() VersionedPrompt { if rand.Float64() < test.SplitRatio { return test.VariantA } return test.VariantB}
func (test *ABTest) RecordMetrics(variant string, latency time.Duration, success bool, tokens int) { metrics := test.MetricsA if variant == "B" { metrics = test.MetricsB }
// Update metrics (use proper aggregation in production) metrics.AvgLatency = (metrics.AvgLatency + latency) / 2 if success { metrics.SuccessRate = (metrics.SuccessRate + 1.0) / 2 } else { metrics.SuccessRate = metrics.SuccessRate / 2 } metrics.AvgTokens = (metrics.AvgTokens + tokens) / 2}
func (test *ABTest) GetWinner() string { // Simple scoring: higher success rate and lower latency wins scoreA := test.MetricsA.SuccessRate / float64(test.MetricsA.AvgLatency.Seconds()) scoreB := test.MetricsB.SuccessRate / float64(test.MetricsB.AvgLatency.Seconds())
if scoreA > scoreB { return "A" } return "B"}Production Best Practices
Section titled “Production Best Practices”When engineering prompts for production:
- Separate prompts from code — use
PromptManageror YAML files so prompt changes do not require code deploys - Version all prompts — track which version produced each response for debugging and compliance
- Use
prompt.Builderfor cache optimization — place static content first, dynamic content last - Test systematically — use the
evalpackage to measure prompt quality across diverse inputs - Monitor token usage — track input and output tokens per prompt version to detect bloat
- Validate template variables — ensure all required variables are provided before rendering to avoid template errors at runtime
- Document prompt intent — describe what each prompt is designed to achieve and what assumptions it makes
- Implement gradual rollouts — use A/B testing to validate prompt changes before full deployment
- Keep prompts focused — one prompt per task; avoid multi-purpose prompts that are hard to optimize
- Store prompts in version control — treat prompts as code artifacts with the same review and deployment rigor
Next Steps
Section titled “Next Steps”Now that you understand prompt engineering:
- Learn about Structured Output for typed responses
- Explore Working with LLMs for model configuration
- Read Building Your First Agent for using prompts with agents