Skip to content
Docs

Datadog AI Dashboards

Datadog is the preferred observability platform for teams that need a unified view of AI operations alongside traditional infrastructure metrics. By routing Beluga AI’s OpenTelemetry telemetry to Datadog, you get LLM latency, token usage, and agent performance data in the same dashboards where you monitor CPU, memory, and request rates. This eliminates tool-switching and enables cross-correlation between AI behavior and infrastructure health. This guide shows how to export that telemetry to Datadog and set up dashboards for monitoring AI application health and performance.

  • Go 1.23 or later
  • A Beluga AI application with the o11y package configured
  • A Datadog account with an API key
  • Datadog Agent installed (optional, for local development)

Install the OpenTelemetry OTLP exporters for traces and metrics:

Terminal window
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
go get go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp

Set the required environment variables:

Terminal window
export DD_API_KEY="your-datadog-api-key"
export DD_OTLP_ENDPOINT="https://trace-intake.datadoghq.com" # optional, this is the default
VariableDescriptionDefaultRequired
DD_API_KEYDatadog API key-Yes
DD_OTLP_ENDPOINTDatadog OTLP intake endpointhttps://trace-intake.datadoghq.comNo
DD_SITEDatadog site (e.g., datadoghq.eu)datadoghq.comNo
DD_SERVICEService name tagbeluga-aiNo

Set up an OTLP trace exporter pointed at the Datadog intake endpoint:

package main
import (
"context"
"fmt"
"os"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func setupDatadogTracing(ctx context.Context) (*sdktrace.TracerProvider, error) {
endpoint := os.Getenv("DD_OTLP_ENDPOINT")
if endpoint == "" {
endpoint = "https://trace-intake.datadoghq.com"
}
exporter, err := otlptracehttp.New(ctx,
otlptracehttp.WithEndpoint(endpoint),
otlptracehttp.WithHeaders(map[string]string{
"DD-API-KEY": os.Getenv("DD_API_KEY"),
}),
)
if err != nil {
return nil, fmt.Errorf("create trace exporter: %w", err)
}
res, err := resource.New(ctx,
resource.WithAttributes(
semconv.ServiceName("beluga-ai"),
semconv.ServiceVersion("1.0.0"),
),
)
if err != nil {
return nil, fmt.Errorf("create resource: %w", err)
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
)
otel.SetTracerProvider(tp)
return tp, nil
}

Set up an OTLP metrics exporter for Datadog:

import (
"context"
"fmt"
"os"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp"
sdkmetric "go.opentelemetry.io/otel/sdk/metric"
)
func setupDatadogMetrics(ctx context.Context) (*sdkmetric.MeterProvider, error) {
endpoint := os.Getenv("DD_OTLP_ENDPOINT")
if endpoint == "" {
endpoint = "https://trace-intake.datadoghq.com"
}
exporter, err := otlpmetrichttp.New(ctx,
otlpmetrichttp.WithEndpoint(endpoint),
otlpmetrichttp.WithHeaders(map[string]string{
"DD-API-KEY": os.Getenv("DD_API_KEY"),
}),
)
if err != nil {
return nil, fmt.Errorf("create metrics exporter: %w", err)
}
mp := sdkmetric.NewMeterProvider(
sdkmetric.WithReader(sdkmetric.NewPeriodicReader(exporter)),
)
otel.SetMeterProvider(mp)
return mp, nil
}

Wire both exporters into your application:

package main
import (
"context"
"log"
)
func main() {
ctx := context.Background()
tp, err := setupDatadogTracing(ctx)
if err != nil {
log.Fatalf("failed to setup tracing: %v", err)
}
defer func() {
if err := tp.Shutdown(ctx); err != nil {
log.Printf("trace provider shutdown error: %v", err)
}
}()
mp, err := setupDatadogMetrics(ctx)
if err != nil {
log.Fatalf("failed to setup metrics: %v", err)
}
defer func() {
if err := mp.Shutdown(ctx); err != nil {
log.Printf("meter provider shutdown error: %v", err)
}
}()
// Beluga AI components now emit traces and metrics to Datadog
// via the global OTel providers.
log.Println("Datadog telemetry export configured")
}

Import this JSON into Datadog to create an LLM Operations dashboard with key widgets:

{
"title": "Beluga AI - LLM Operations",
"widgets": [
{
"definition": {
"type": "timeseries",
"requests": [
{
"q": "sum:gen_ai.client.operation.duration.count{*}.as_count()",
"display_type": "line"
}
],
"title": "LLM Calls per Second"
}
},
{
"definition": {
"type": "timeseries",
"requests": [
{
"q": "avg:gen_ai.client.operation.duration{*}",
"display_type": "line"
}
],
"title": "Average LLM Latency"
}
},
{
"definition": {
"type": "timeseries",
"requests": [
{
"q": "sum:gen_ai.client.token.usage{token.type:input}{*}.as_count()",
"display_type": "bars"
}
],
"title": "Input Token Usage"
}
},
{
"definition": {
"type": "timeseries",
"requests": [
{
"q": "sum:gen_ai.client.token.usage{token.type:output}{*}.as_count()",
"display_type": "bars"
}
],
"title": "Output Token Usage"
}
}
]
}

Beluga AI uses the OpenTelemetry gen_ai.* semantic conventions for all LLM telemetry. Adjust the metric names in the dashboard queries to match any custom metrics your application emits.

To reduce data volume in high-throughput applications, configure a sampling strategy on the tracer provider:

tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
sdktrace.WithSampler(sdktrace.TraceIDRatioBased(0.1)), // Sample 10% of traces
)

Add application-specific metrics alongside the built-in Beluga AI telemetry:

import "go.opentelemetry.io/otel"
meter := otel.Meter("beluga.custom")
counter, err := meter.Int64Counter("beluga.custom.requests_total")
if err != nil {
log.Fatalf("failed to create counter: %v", err)
}
counter.Add(ctx, 1)

Configure Datadog monitors on key metrics:

  • LLM error rate: Alert when gen_ai.client.operation.duration spans with error status exceed a threshold
  • Latency P95: Alert on high tail latency for LLM calls
  • Token budget: Alert when cumulative token usage approaches cost limits

Use consistent tags across all telemetry to enable effective filtering:

  • service: Application name
  • gen_ai.system: LLM provider (e.g., openai, anthropic)
  • gen_ai.request.model: Model identifier
  • environment: Deployment environment (staging, production)

Verify that DD_API_KEY contains a valid Datadog API key. API keys can be managed in Organization Settings > API Keys in the Datadog Console.

Confirm that the OTLP endpoint is correct for your Datadog site. For EU customers, use https://trace-intake.datadoghq.eu. Add debug logging to the exporter to verify data is being sent.