Observability

The SDK integrates observability at every layer -- LLM calls, agent steps, tool executions, and workflow nodes all emit traces, logs, and metrics.

Tracer Interface

The SDK defines a Tracer interface compatible with OpenTelemetry:

type Tracer interface {
    StartSpan(ctx context.Context, name string) (context.Context, Span)
}

type Span interface {
    End()
    SetAttribute(key string, value any)
    SetError(err error)
    Context() context.Context
}

Configuring Observability

Pass logger, metrics, and tracer when creating SDK components:

import sdk "github.com/xraph/ai-sdk"

generator := sdk.NewTextGenerator(ctx, llmManager, logger, metrics)

agent, _ := sdk.NewReactAgentBuilder("assistant").
    WithLLMManager(llmManager).
    WithLogger(logger).
    WithMetrics(metrics).
    Build()

Most constructors accept logger and metrics as parameters. Pass nil to disable.

Structured Logging

The SDK uses the github.com/xraph/go-utils/log logger interface. Key events logged:

Event	Level	Context
LLM request sent	Debug	Provider, model, token count
LLM response received	Debug	Provider, model, latency, tokens used
Tool execution	Debug	Tool name, parameters, duration
Agent step	Info	Step type, iteration, tool calls
Guardrail violation	Warn	Violation type, severity
Circuit breaker state change	Warn	Old state, new state, failure count
Budget alert	Warn	Budget name, current spend, limit
Error	Error	Operation, error details

Metrics

The SDK emits metrics via the github.com/xraph/go-utils/metrics interface:

Counters

Metric	Description
`ai.sdk.llm.requests`	Total LLM requests
`ai.sdk.llm.errors`	LLM request errors
`ai.sdk.tool.executions`	Tool executions
`ai.sdk.tool.errors`	Tool execution errors
`ai.sdk.agent.steps`	Agent reasoning steps
`ai.sdk.guardrail.violations`	Guardrail violations
`ai.sdk.circuit_breaker.rejected`	Circuit breaker rejections
`ai.sdk.cache.hits`	Cache hits
`ai.sdk.cache.misses`	Cache misses

Histograms

Metric	Description
`ai.sdk.llm.latency`	LLM request latency
`ai.sdk.tool.duration`	Tool execution duration
`ai.sdk.agent.duration`	Total agent execution time
`ai.sdk.tokens.input`	Input tokens per request
`ai.sdk.tokens.output`	Output tokens per request

Health Checks

Providers and the LLM manager expose health checks:

// Check a specific provider
err := openaiProvider.HealthCheck(ctx)

// Check all registered providers
err := llmManager.HealthCheck(ctx)

OpenTelemetry Integration

Implement the Tracer interface with your OTel setup:

import "go.opentelemetry.io/otel"

type OTelTracer struct {
    tracer trace.Tracer
}

func (t *OTelTracer) StartSpan(ctx context.Context, name string) (context.Context, sdk.Span) {
    ctx, span := t.tracer.Start(ctx, name)
    return ctx, &OTelSpan{span: span}
}

type OTelSpan struct {
    span trace.Span
}

func (s *OTelSpan) End()                          { s.span.End() }
func (s *OTelSpan) SetAttribute(key string, val any) { /* ... */ }
func (s *OTelSpan) SetError(err error)             { s.span.RecordError(err) }
func (s *OTelSpan) Context() context.Context       { return trace.ContextWithSpan(context.Background(), s.span) }

Pass it via Options:

sdkInstance := sdk.New(llmManager, &sdk.Options{
    Tracer: &OTelTracer{tracer: otel.Tracer("ai-sdk")},
})

Observability

On this page