Observability

Distributed tracing, structured logging, and metrics collection

The SDK integrates observability at every layer -- LLM calls, agent steps, tool executions, and workflow nodes all emit traces, logs, and metrics.

Tracer Interface

The SDK defines a Tracer interface compatible with OpenTelemetry:

type Tracer interface {
    StartSpan(ctx context.Context, name string) (context.Context, Span)
}

type Span interface {
    End()
    SetAttribute(key string, value any)
    SetError(err error)
    Context() context.Context
}

Configuring Observability

Pass logger, metrics, and tracer when creating SDK components:

import sdk "github.com/xraph/ai-sdk"

generator := sdk.NewTextGenerator(ctx, llmManager, logger, metrics)

agent, _ := sdk.NewReactAgentBuilder("assistant").
    WithLLMManager(llmManager).
    WithLogger(logger).
    WithMetrics(metrics).
    Build()

Most constructors accept logger and metrics as parameters. Pass nil to disable.

Structured Logging

The SDK uses the github.com/xraph/go-utils/log logger interface. Key events logged:

EventLevelContext
LLM request sentDebugProvider, model, token count
LLM response receivedDebugProvider, model, latency, tokens used
Tool executionDebugTool name, parameters, duration
Agent stepInfoStep type, iteration, tool calls
Guardrail violationWarnViolation type, severity
Circuit breaker state changeWarnOld state, new state, failure count
Budget alertWarnBudget name, current spend, limit
ErrorErrorOperation, error details

Metrics

The SDK emits metrics via the github.com/xraph/go-utils/metrics interface:

Counters

MetricDescription
ai.sdk.llm.requestsTotal LLM requests
ai.sdk.llm.errorsLLM request errors
ai.sdk.tool.executionsTool executions
ai.sdk.tool.errorsTool execution errors
ai.sdk.agent.stepsAgent reasoning steps
ai.sdk.guardrail.violationsGuardrail violations
ai.sdk.circuit_breaker.rejectedCircuit breaker rejections
ai.sdk.cache.hitsCache hits
ai.sdk.cache.missesCache misses

Histograms

MetricDescription
ai.sdk.llm.latencyLLM request latency
ai.sdk.tool.durationTool execution duration
ai.sdk.agent.durationTotal agent execution time
ai.sdk.tokens.inputInput tokens per request
ai.sdk.tokens.outputOutput tokens per request

Health Checks

Providers and the LLM manager expose health checks:

// Check a specific provider
err := openaiProvider.HealthCheck(ctx)

// Check all registered providers
err := llmManager.HealthCheck(ctx)

OpenTelemetry Integration

Implement the Tracer interface with your OTel setup:

import "go.opentelemetry.io/otel"

type OTelTracer struct {
    tracer trace.Tracer
}

func (t *OTelTracer) StartSpan(ctx context.Context, name string) (context.Context, sdk.Span) {
    ctx, span := t.tracer.Start(ctx, name)
    return ctx, &OTelSpan{span: span}
}

type OTelSpan struct {
    span trace.Span
}

func (s *OTelSpan) End()                          { s.span.End() }
func (s *OTelSpan) SetAttribute(key string, val any) { /* ... */ }
func (s *OTelSpan) SetError(err error)             { s.span.RecordError(err) }
func (s *OTelSpan) Context() context.Context       { return trace.ContextWithSpan(context.Background(), s.span) }

Pass it via Options:

sdkInstance := sdk.New(llmManager, &sdk.Options{
    Tracer: &OTelTracer{tracer: otel.Tracer("ai-sdk")},
})

How is this guide?

On this page