OpenTelemetry Tracing Tutorial
Metrics tell you what happened (error rate increased). Traces tell you why (Agent A called Tool B, which timed out calling API C). In complex AI workflows with multiple LLM calls, tool invocations, and retrieval steps, distributed tracing is essential for debugging and performance analysis. Without traces, diagnosing a slow agent requires guessing which of its many internal operations caused the delay.
Beluga AI uses the OpenTelemetry GenAI semantic conventions (gen_ai.* attribute namespace) for LLM observability. These conventions are an emerging standard that ensures consistent attribute naming across providers, making it possible to build dashboards and alerts that work regardless of which LLM provider is in use.
What You Will Build
Section titled “What You Will Build”An instrumented application with OpenTelemetry tracing, including custom spans, automatic LLM instrumentation, and Jaeger visualization.
Prerequisites
Section titled “Prerequisites”- Go 1.23+
- Docker (for running Jaeger)
Step 1: Install Dependencies
Section titled “Step 1: Install Dependencies”go get go.opentelemetry.io/otelgo get go.opentelemetry.io/otel/sdkgo get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttpStep 2: Initialize the Tracer Provider
Section titled “Step 2: Initialize the Tracer Provider”Set up an OTLP exporter that sends traces to a collector (Jaeger, Grafana Tempo, etc.). The OTLP protocol is used because it is the vendor-neutral OpenTelemetry standard — the same exporter configuration works with Jaeger, Grafana Tempo, Datadog, and other backends without code changes. The resource attaches service metadata to every span, enabling trace filtering by service name and version in the visualization UI.
package main
import ( "context" "fmt" "log"
"go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" "go.opentelemetry.io/otel/sdk/resource" sdktrace "go.opentelemetry.io/otel/sdk/trace" semconv "go.opentelemetry.io/otel/semconv/v1.26.0")
func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) { exporter, err := otlptracehttp.New(ctx, otlptracehttp.WithEndpoint("localhost:4318"), otlptracehttp.WithInsecure(), ) if err != nil { return nil, fmt.Errorf("create exporter: %w", err) }
res, err := resource.New(ctx, resource.WithAttributes( semconv.ServiceName("beluga-agent"), semconv.ServiceVersion("1.0.0"), ), ) if err != nil { return nil, fmt.Errorf("create resource: %w", err) }
tp := sdktrace.NewTracerProvider( sdktrace.WithBatcher(exporter), sdktrace.WithResource(res), )
otel.SetTracerProvider(tp) return tp, nil}Step 3: Instrument Your Application
Section titled “Step 3: Instrument Your Application”Create a root span and pass the context through your pipeline. The root span represents the entire workflow, and Beluga AI components automatically create child spans when they receive a traced context. This automatic instrumentation means that LLM calls, tool executions, and retrieval operations appear as child spans without explicit instrumentation code — you only need to propagate the context.
func main() { ctx := context.Background()
tp, err := initTracer(ctx) if err != nil { log.Fatalf("init tracer: %v", err) } defer func() { if err := tp.Shutdown(ctx); err != nil { log.Printf("shutdown tracer: %v", err) } }()
tracer := otel.Tracer("main")
// Create root span for the entire workflow ctx, span := tracer.Start(ctx, "agent-workflow") defer span.End()
// Pass ctx to Beluga AI components — they attach child spans automatically if err := runPipeline(ctx); err != nil { span.RecordError(err) log.Printf("pipeline error: %v", err) }}Step 4: Add Custom Spans
Section titled “Step 4: Add Custom Spans”Instrument your own logic with custom spans and attributes. Each span represents a unit of work in the trace timeline. Adding attributes (like document.id and document.type) enables filtering and grouping in the trace viewer — for example, finding all traces that processed PDF documents or identifying which document ID caused a failure.
func processDocument(ctx context.Context, docID string) error { tracer := otel.Tracer("document-processor") ctx, span := tracer.Start(ctx, "process-document") defer span.End()
// Add attributes for filtering and analysis span.SetAttributes( attribute.String("document.id", docID), attribute.String("document.type", "pdf"), )
// Step 1: Parse ctx, parseSpan := tracer.Start(ctx, "parse-document") // ... parsing logic ... parseSpan.End()
// Step 2: Embed ctx, embedSpan := tracer.Start(ctx, "embed-document") // ... embedding logic ... embedSpan.AddEvent("embedding-complete", trace.WithAttributes( attribute.Int("dimensions", 1536), )) embedSpan.End()
return nil}Step 5: GenAI Semantic Conventions
Section titled “Step 5: GenAI Semantic Conventions”Beluga AI’s o11y package uses the OpenTelemetry GenAI conventions. These standardized attribute names ensure that dashboards, alerts, and analysis queries work across all LLM providers without provider-specific logic.
| Attribute | Description |
|---|---|
gen_ai.system | Provider name (openai, anthropic, etc.) |
gen_ai.request.model | Model ID (gpt-4o, claude-3-opus, etc.) |
gen_ai.request.temperature | Sampling temperature |
gen_ai.request.max_tokens | Max token limit |
gen_ai.usage.input_tokens | Prompt token count |
gen_ai.usage.output_tokens | Completion token count |
gen_ai.response.finish_reason | Stop reason (stop, length, tool_calls) |
Step 6: Run Jaeger
Section titled “Step 6: Run Jaeger”Start Jaeger to collect and visualize traces. Jaeger’s all-in-one image includes the collector, storage, and UI in a single container, making it suitable for local development.
docker run -d --name jaeger \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latestAccess the Jaeger UI at http://localhost:16686. Select the beluga-agent service to view traces.
Verification
Section titled “Verification”- Run your instrumented application.
- Open the Jaeger UI at
http://localhost:16686. - Select the
beluga-agentservice. - Find traces and inspect the span timeline — verify parent-child relationships across LLM calls, tool executions, and retrieval steps.
Next Steps
Section titled “Next Steps”- Prometheus and Grafana — Metrics visualization
- Health Checks — Component health monitoring