Distributed Tracing¶

Understanding how Prela traces AI agent execution.

What is Tracing?¶

Tracing tracks the flow of execution through your application. Each operation (LLM call, tool invocation, agent step) creates a span that records:

When it started and ended
What inputs and outputs it had
Whether it succeeded or failed
How long it took

Spans are organized into traces - a tree of related operations representing a single request or workflow.

Anatomy of a Trace¶

graph TD
    A[Root Span: Agent Execution] --> B[Span: LLM Call #1]
    A --> C[Span: Tool: Search]
    C --> D[Span: LLM Call #2]
    A --> E[Span: Tool: Calculator]
    E --> F[Span: LLM Call #3]

    style A fill:#4F46E5
    style B fill:#6366F1
    style C fill:#818CF8
    style D fill:#A5B4FC
    style E fill:#818CF8
    style F fill:#A5B4FC

Each span has:

Trace ID: Unique identifier for the entire trace
Span ID: Unique identifier for this span
Parent Span ID: Links to parent span (if not root)
Name: Human-readable operation name
Type: AGENT, LLM, TOOL, RETRIEVAL, EMBEDDING, CUSTOM
Status: PENDING, SUCCESS, ERROR
Timestamps: Start and end times
Duration: Calculated from timestamps
Attributes: Key-value metadata
Events: Timestamped events during execution

How Prela Traces Work¶

1. Initialization¶

When you call prela.init():

import prela

prela.init(service_name="my-agent")

Prela:

Creates a global tracer
Auto-instruments installed LLM SDKs (OpenAI, Anthropic, etc.)
Sets up context propagation for threads and async

2. Automatic Span Creation¶

When you call an LLM API:

from anthropic import Anthropic

client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}]
)

Behind the scenes:

sequenceDiagram
    participant App
    participant Instrumentation
    participant Tracer
    participant LLM API

    App->>Instrumentation: client.messages.create()
    Note over Instrumentation: Intercepted by wrapper
    Instrumentation->>Tracer: Create span
    Tracer->>Tracer: Generate trace_id, span_id
    Instrumentation->>Instrumentation: Record request attributes
    Instrumentation->>LLM API: Forward request
    LLM API-->>Instrumentation: Response
    Instrumentation->>Instrumentation: Record response attributes
    Instrumentation->>Tracer: End span
    Tracer->>Tracer: Calculate duration
    Tracer->>Tracer: Export (if root span)
    Instrumentation-->>App: Return response

3. Context Propagation¶

Prela uses Python's contextvars to track the current span:

import prela

tracer = prela.get_tracer()

# Root span
with tracer.span("agent_execution"):
    # Child span (automatically linked)
    with tracer.span("tool_call"):
        # Grandchild span (automatically linked)
        llm_response = client.messages.create(...)

Resulting trace tree:

agent_execution
├── tool_call
│   └── anthropic.messages.create

4. Thread Safety¶

Prela is safe for multi-threaded applications:

from concurrent.futures import ThreadPoolExecutor

def process_query(query):
    # Each thread has its own trace context
    with tracer.span("process_query"):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            messages=[{"role": "user", "content": query}]
        )
        return response

with ThreadPoolExecutor(max_workers=4) as executor:
    results = executor.map(process_query, queries)

5. Async Support¶

Prela works with async/await:

import asyncio
from anthropic import AsyncAnthropic

async def process_async(query):
    client = AsyncAnthropic()

    # Automatic tracing in async context
    with tracer.span("async_process"):
        response = await client.messages.create(
            model="claude-sonnet-4-20250514",
            messages=[{"role": "user", "content": query}]
        )
        return response

await asyncio.gather(*[process_async(q) for q in queries])

Span Attributes¶

Spans capture rich metadata:

Service Attributes¶

service.name: Your service name (from prela.init())

LLM Attributes¶

llm.vendor: Provider (openai, anthropic)
llm.model: Model name
llm.temperature: Temperature parameter
llm.max_tokens: Max tokens parameter
llm.input_tokens: Tokens in request
llm.output_tokens: Tokens in response
llm.total_tokens: Sum of input/output
llm.latency_ms: Response latency
llm.finish_reason: Completion reason
llm.stop_reason: Stop reason

Tool Attributes¶

tool.name: Tool name
tool.arguments: Tool arguments (JSON)

Error Attributes¶

error.type: Exception class name
error.message: Exception message
error.stacktrace: Full stack trace

Span Events¶

Events are timestamped logs within a span:

{
  "name": "llm.request",
  "timestamp": "2025-01-26T10:00:00.123456Z",
  "attributes": {
    "messages": [{"role": "user", "content": "Hello"}]
  }
}

Common events:

llm.request: LLM request sent
llm.response: LLM response received
tool.call: Tool invoked
tool.result: Tool returned result

Best Practices¶

1. Meaningful Span Names¶

Use descriptive, consistent names:

# Good
with tracer.span("fetch_user_data"):
    ...

with tracer.span("generate_response"):
    ...

# Bad
with tracer.span("function1"):
    ...

with tracer.span("do_stuff"):
    ...

2. Appropriate Span Granularity¶

Don't create spans for trivial operations:

# Good: One span for the whole process
with tracer.span("process_document"):
    text = extract_text(doc)
    summary = summarize(text)
    return summary

# Bad: Too granular
with tracer.span("process_document"):
    with tracer.span("extract_text"):
        text = extract_text(doc)
    with tracer.span("summarize"):
        summary = summarize(text)
    return summary

3. Add Context with Attributes¶

Enrich spans with relevant metadata:

with tracer.span("query_database") as span:
    span.set_attribute("query", sql_query)
    span.set_attribute("database", db_name)
    result = execute_query(sql_query)
    span.set_attribute("rows_returned", len(result))

4. Record Events for Milestones¶

with tracer.span("agent_workflow") as span:
    span.add_event("planning_started")
    plan = create_plan()

    span.add_event("execution_started")
    result = execute_plan(plan)

    span.add_event("validation_started")
    validate(result)

5. Handle Errors Properly¶

Errors are automatically captured, but you can add context:

with tracer.span("risky_operation") as span:
    try:
        result = dangerous_function()
    except ValueError as e:
        span.set_attribute("error.context", "Invalid input format")
        raise  # Re-raise to mark span as error

Trace Lifecycle¶

stateDiagram-v2
    [*] --> Created: tracer.span()
    Created --> Active: __enter__()
    Active --> Active: set_attribute()
    Active --> Active: add_event()
    Active --> Ended: __exit__()
    Ended --> Exported: export()
    Exported --> [*]

    Active --> Error: Exception
    Error --> Ended: __exit__()

Created: tracer.span() called
Active: Context entered, span can be modified
Ended: Context exited, span immutable
Exported: Sent to exporter (only root spans)

Performance Considerations¶

Overhead¶

Prela adds minimal overhead:

Span creation: ~0.1ms
Attribute setting: ~0.01ms per attribute
Event recording: ~0.01ms per event
Export: Happens asynchronously (root spans only)

Total: ~0.5-2ms per span

Sampling¶

Reduce overhead in high-traffic systems:

prela.init(
    service_name="high-traffic-agent",
    sample_rate=0.01  # Sample 1% of traces
)

See Sampling for details.

Next Steps¶

Spans - Deep dive into span internals
Context Propagation - Thread and async context
Sampling - Control trace volume
Exporters - Where traces go