Observability & Monitoring
AI applications are uniquely difficult to debug. An agent might produce the wrong answer because of a prompt issue, a retrieval miss, a tool error, or a model hallucination — and without visibility into the full execution chain, you are guessing. Observability gives you the data to understand what happened, why it happened, and how much it cost.
Beluga AI provides built-in observability through OpenTelemetry (OTel) using GenAI semantic conventions, structured logging via slog, health checks, and LLM-specific trace exporters. The o11y package is the central integration point, and it works with any OTel-compatible backend.
Observability Architecture
Section titled “Observability Architecture”Beluga AI Application ├── OTel Traces (gen_ai.* attributes) │ ├── Jaeger / Tempo │ ├── Datadog APM │ └── Grafana Cloud ├── OTel Metrics (gen_ai.usage.*) │ ├── Prometheus │ └── Datadog Metrics ├── Structured Logs (slog) │ ├── stdout / stderr │ └── Log aggregator └── LLM Trace Exporters ├── Langfuse ├── LangSmith (Opik) └── Arize PhoenixOpenTelemetry Setup
Section titled “OpenTelemetry Setup”Beluga uses OTel SDK v1.40.0 with GenAI semantic conventions (semconv v1.39.0).
Basic OTel Configuration
Section titled “Basic OTel Configuration”package main
import ( "context" "log"
"github.com/lookatitude/beluga-ai/o11y")
func main() { ctx := context.Background()
// Initialize OTel with OTLP exporter shutdown, err := o11y.Init(ctx, o11y.Config{ ServiceName: "my-agent", ServiceVersion: "1.0.0", OTLPEndpoint: "localhost:4317", }) if err != nil { log.Fatal(err) } defer shutdown(ctx)
// All Beluga operations now emit traces and metrics automatically}GenAI Attributes
Section titled “GenAI Attributes”Beluga traces use OTel GenAI semantic conventions:
| Attribute | Description | Example |
|---|---|---|
gen_ai.system | Provider system | "openai" |
gen_ai.operation.name | Operation type | "chat", "embed" |
gen_ai.request.model | Requested model | "gpt-4o" |
gen_ai.response.model | Actual model used | "gpt-4o-2024-08-06" |
gen_ai.usage.input_tokens | Input tokens | 150 |
gen_ai.usage.output_tokens | Output tokens | 89 |
gen_ai.agent.name | Agent name | "customer-support" |
gen_ai.tool.name | Tool invoked | "search_database" |
Custom Span Attributes
Section titled “Custom Span Attributes”tracer := o11y.Tracer("my-agent")
ctx, span := tracer.Start(ctx, "process_request", o11y.Attrs{ "gen_ai.operation.name": "chat", "gen_ai.request.model": "gpt-4o", "tenant.id": tenantID,})defer span.End()
// After LLM call completesspan.SetAttributes( o11y.AttrInputTokens, 150, o11y.AttrOutputTokens, 89,)Metrics
Section titled “Metrics”Beluga emits OTel metrics for LLM operations, latency, and resource usage.
Prometheus Integration
Section titled “Prometheus Integration”Configure Prometheus scraping with the OTel Prometheus exporter:
shutdown, err := o11y.Init(ctx, o11y.Config{ ServiceName: "my-agent", MetricsPort: 9090, MetricsPath: "/metrics",})Key metrics exposed:
| Metric | Type | Description |
|---|---|---|
gen_ai_client_operation_duration | Histogram | LLM call latency |
gen_ai_client_token_usage | Counter | Token consumption |
gen_ai_server_request_duration | Histogram | Server-side latency |
beluga_agent_invocations_total | Counter | Agent execution count |
beluga_tool_calls_total | Counter | Tool invocations |
Grafana Dashboards
Section titled “Grafana Dashboards”With Prometheus as a data source, build Grafana dashboards for:
- LLM Performance: Latency percentiles (p50, p95, p99) by model and provider
- Token Usage: Input/output tokens over time, cost estimation
- Agent Activity: Invocations, tool calls, error rates
- RAG Pipeline: Embedding latency, search latency, retrieval quality
Example PromQL queries:
# P95 latency by modelhistogram_quantile(0.95, rate(gen_ai_client_operation_duration_bucket[5m]))
# Token usage rate by providerrate(gen_ai_client_token_usage[5m])
# Error raterate(beluga_agent_invocations_total{status="error"}[5m]) / rate(beluga_agent_invocations_total[5m])LLM Trace Exporters
Section titled “LLM Trace Exporters”The o11y package provides a TraceExporter interface for sending detailed LLM call data to specialized observability platforms.
type TraceExporter interface { ExportLLMCall(ctx context.Context, data LLMCallData) error}Langfuse
Section titled “Langfuse”Langfuse provides open-source LLM observability with prompt management and evaluation.
import _ "github.com/lookatitude/beluga-ai/o11y/providers/langfuse"
exporter, err := o11y.NewTraceExporter("langfuse", config.ProviderConfig{ Options: map[string]any{ "public_key": os.Getenv("LANGFUSE_PUBLIC_KEY"), "secret_key": os.Getenv("LANGFUSE_SECRET_KEY"), "host": "https://cloud.langfuse.com", },})LangSmith (Opik)
Section titled “LangSmith (Opik)”import _ "github.com/lookatitude/beluga-ai/o11y/providers/opik"
exporter, err := o11y.NewTraceExporter("opik", config.ProviderConfig{ Options: map[string]any{ "api_key": os.Getenv("OPIK_API_KEY"), "project": "my-project", },})Arize Phoenix
Section titled “Arize Phoenix”Arize Phoenix provides open-source LLM tracing with embedding visualization.
import _ "github.com/lookatitude/beluga-ai/o11y/providers/phoenix"
exporter, err := o11y.NewTraceExporter("phoenix", config.ProviderConfig{ Options: map[string]any{ "endpoint": "http://localhost:6006", },})Multi-Exporter
Section titled “Multi-Exporter”Export to multiple backends simultaneously:
multi := o11y.NewMultiExporter(langfuseExporter, phoenixExporter)
err := multi.ExportLLMCall(ctx, o11y.LLMCallData{ Model: "gpt-4o", Provider: "openai", InputTokens: 150, OutputTokens: 89, Duration: 450 * time.Millisecond, Cost: 0.0023, Messages: serializedMessages, Response: serializedResponse,})Structured Logging
Section titled “Structured Logging”Beluga uses Go’s slog package for structured logging.
Configuration
Section titled “Configuration”import "log/slog"
// JSON output for productionlogger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{ Level: slog.LevelInfo,}))slog.SetDefault(logger)Log Attributes
Section titled “Log Attributes”Beluga middleware adds structured fields to log entries:
model = llm.ApplyMiddleware(model, llm.WithLogging(logger))
// Produces logs like:// {"level":"INFO","msg":"llm.generate","model":"gpt-4o","input_tokens":150,"output_tokens":89,"duration_ms":450}Health Checks
Section titled “Health Checks”The o11y package provides health check endpoints for load balancers and orchestrators.
health := o11y.NewHealthChecker()health.Register("llm", func(ctx context.Context) error { _, err := model.Generate(ctx, []schema.Message{ schema.NewUserMessage(schema.Text("ping")), }) return err})health.Register("vectorstore", func(ctx context.Context) error { _, err := store.Search(ctx, zeroVec, 1) return err})
// Expose at /healthzhttp.Handle("/healthz", health.Handler())Datadog Integration
Section titled “Datadog Integration”Datadog receives telemetry through the OTel Collector with the Datadog exporter:
receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317
exporters: datadog: api: key: ${DD_API_KEY} traces: span_name_as_resource_name: true
service: pipelines: traces: receivers: [otlp] exporters: [datadog] metrics: receivers: [otlp] exporters: [datadog]Point Beluga’s OTLP endpoint to the collector:
shutdown, err := o11y.Init(ctx, o11y.Config{ ServiceName: "my-agent", OTLPEndpoint: "localhost:4317",})Choosing an Observability Stack
Section titled “Choosing an Observability Stack”| Need | Recommended |
|---|---|
| Full APM + infrastructure | Datadog |
| Open-source, self-hosted | Grafana + Tempo + Prometheus |
| LLM-specific debugging | Langfuse or Arize Phoenix |
| Quick local development | Arize Phoenix (local) |
| Enterprise with existing OTel | Your existing OTel collector |