LLM Error Handling
Problem
Section titled “Problem”You are calling an LLM API and need to handle rate limits, timeouts, and API errors gracefully without crashing your application or losing user requests.
Solution
Section titled “Solution”Implement a layered error handling strategy with retry logic, exponential backoff, and proper error classification. Transient errors are retried automatically while permanent errors fail fast with clear messages.
Why This Matters
Section titled “Why This Matters”LLM APIs are external services with well-known failure modes: rate limits (HTTP 429), server errors (HTTP 5xx), and network timeouts. Each failure type requires a different response strategy. Retrying an authentication error (HTTP 401) wastes API calls and delays the real fix. Not retrying a rate limit error drops a request that would succeed after a brief wait.
Beluga AI’s core.IsRetryable() function provides the first layer of error classification by inspecting the error’s ErrorCode. This recipe adds a second layer by pattern-matching on error messages for cases where the error isn’t wrapped in Beluga’s error types (e.g., raw HTTP errors from provider SDKs). Exponential backoff with jitter prevents the thundering herd problem — when many clients hit a rate limit simultaneously, jitter ensures they don’t all retry at the same time and trigger another rate limit.
Code Example
Section titled “Code Example”package main
import ( "context" "errors" "fmt" "log" "math/rand" "strings" "time"
"github.com/lookatitude/beluga-ai/core" "github.com/lookatitude/beluga-ai/llm" "github.com/lookatitude/beluga-ai/schema")
// RetryConfig configures retry behavior for LLM calls.type RetryConfig struct { MaxRetries int InitialBackoff time.Duration MaxBackoff time.Duration BackoffFactor float64 Jitter float64}
var DefaultRetryConfig = RetryConfig{ MaxRetries: 3, InitialBackoff: 1 * time.Second, MaxBackoff: 30 * time.Second, BackoffFactor: 2.0, Jitter: 0.1,}
// LLMClient wraps an LLM with retry and error handling.type LLMClient struct { model llm.ChatModel config RetryConfig}
func NewLLMClient(model llm.ChatModel, config RetryConfig) *LLMClient { return &LLMClient{model: model, config: config}}
// GenerateWithRetry calls the LLM with automatic retry on transient errors.func (c *LLMClient) GenerateWithRetry(ctx context.Context, messages []schema.Message) (schema.Message, error) { var lastErr error backoff := c.config.InitialBackoff
for attempt := 0; attempt <= c.config.MaxRetries; attempt++ { if ctx.Err() != nil { return nil, fmt.Errorf("context cancelled before attempt %d: %w", attempt, ctx.Err()) }
response, err := c.model.Generate(ctx, messages) if err == nil { return response, nil }
lastErr = err
if !isRetryable(err) { return nil, fmt.Errorf("permanent error (not retrying): %w", err) }
if attempt < c.config.MaxRetries { jitteredBackoff := c.calculateBackoff(backoff) log.Printf("LLM call failed (attempt %d/%d): %v. Retrying in %v", attempt+1, c.config.MaxRetries+1, err, jitteredBackoff)
select { case <-ctx.Done(): return nil, fmt.Errorf("context cancelled during backoff: %w", ctx.Err()) case <-time.After(jitteredBackoff): }
backoff = time.Duration(float64(backoff) * c.config.BackoffFactor) if backoff > c.config.MaxBackoff { backoff = c.config.MaxBackoff } } }
return nil, fmt.Errorf("max retries (%d) exceeded: %w", c.config.MaxRetries, lastErr)}
func (c *LLMClient) calculateBackoff(base time.Duration) time.Duration { if c.config.Jitter == 0 { return base } jitter := float64(base) * c.config.Jitter * (rand.Float64()*2 - 1) return time.Duration(float64(base) + jitter)}
func isRetryable(err error) bool { if core.IsRetryable(err) { return true } if errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled) { return false }
errMsg := strings.ToLower(err.Error()) retryablePatterns := []string{ "rate limit", "429", "too many requests", "temporarily unavailable", "service unavailable", "503", "connection reset", "connection refused", "timeout", } for _, pattern := range retryablePatterns { if strings.Contains(errMsg, pattern) { return true } } return false}
func main() { ctx := context.Background()
model, err := llm.New("openai", llm.ProviderConfig{ APIKey: "your-api-key", }) if err != nil { log.Fatalf("Failed to create LLM: %v", err) }
client := NewLLMClient(model, DefaultRetryConfig)
messages := []schema.Message{ schema.NewHumanMessage("Hello, how are you?"), }
response, err := client.GenerateWithRetry(ctx, messages) if err != nil { log.Fatalf("LLM call failed: %v", err) }
fmt.Printf("Response: %s\n", response.GetContent())}Explanation
Section titled “Explanation”-
Error classification — The
isRetryablefunction checks Beluga AI’s error types usingcore.IsRetryable()first, then falls back to pattern matching on error messages. Rate limits and server errors are retried; authentication errors and client errors are not. This two-layer approach handles both framework-wrapped errors and raw provider SDK errors. -
Exponential backoff with jitter — The backoff doubles each time (1s, 2s, 4s…) but never exceeds
MaxBackoff. Random jitter (controlled by theJitterconfig parameter) prevents multiple clients from retrying simultaneously, which would cause another rate limit spike. The jitter range is +-10% of the backoff by default. -
Context awareness — The function checks
ctx.Err()before each attempt and during backoff waits usingselect. This respects timeouts and cancellations from the calling code, preventing wasted retries when the caller has already given up. -
MaxBackoff cap — Without a cap, exponential backoff grows unbounded (1s, 2s, 4s, 8s, 16s, 32s, 64s…). The
MaxBackoffof 30 seconds ensures the wait never becomes unreasonable, even after many retries.
Variations
Section titled “Variations”Circuit Breaker Pattern
Section titled “Circuit Breaker Pattern”If experiencing sustained failures, add a circuit breaker to stop retrying entirely and fail fast:
type CircuitBreaker struct { failures int lastFailure time.Time threshold int resetAfter time.Duration}
func (cb *CircuitBreaker) Allow() bool { if cb.failures >= cb.threshold { if time.Since(cb.lastFailure) < cb.resetAfter { return false // Circuit is open } cb.failures = 0 // Reset after cooldown } return true}Fallback Response
Section titled “Fallback Response”For non-critical features, provide a cached or template fallback when the LLM is unavailable:
func (c *LLMClient) GenerateWithFallback(ctx context.Context, messages []schema.Message, fallback string) string { response, err := c.GenerateWithRetry(ctx, messages) if err != nil { log.Printf("LLM failed, using fallback: %v", err) return fallback } return response.GetContent()}Related Recipes
Section titled “Related Recipes”- Custom Agent Patterns — Use error handling in custom agents
- Global Retry Wrappers — Framework-level retry strategies
- Context Timeout Management — Timeout handling for operations