Skip to content

Prela

Observability for AI Agents

Prela is a production-ready Python SDK for tracing, monitoring, and evaluating autonomous AI agents. Get complete visibility into what your LLM-powered applications are doing with zero-code auto-instrumentation.

Production Validated

โœ… 21/21 core features validated with real API calls โœ… 4/4 performance criteria met (SDK overhead <5%, CLI <1s response) โœ… 1,068 passing tests with 100% coverage on new code โœ… Ready for PyPI publication


Why Prela?

Building AI agents is hard. Understanding what they're doing is even harder. Prela solves this by providing:

  • ๐Ÿ” Automatic Tracing: Zero-code instrumentation for 10 frameworks (OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, AutoGen, LangGraph, Swarm, n8n)
  • ๐Ÿค– Multi-Agent Support: First-class support for multi-agent orchestration, conversations, and handoffs
  • ๐Ÿ“Š Complete Visibility: See every LLM call, tool invocation, agent decision, and state change
  • ๐Ÿงช Built-in Testing: Comprehensive evaluation framework with 21+ assertion types including security, LLM-as-Judge, multi-agent, and workflow patterns
  • ๐Ÿ›ก๏ธ Guardrails: Real-time safety enforcement with PII detection, prompt injection blocking, content filtering, and custom guards
  • ๐Ÿ”” Alerting: Metric-based alerts (error rate, latency, cost, token usage) with Slack, Email, and PagerDuty notifications
  • ๐Ÿ“ Prompt Management: Versioned prompt templates with variable substitution and stage-based promotion (development, staging, production)
  • ๐Ÿ”„ Deterministic Replay: Re-execute traces with different models, compare outputs, test tool execution
  • ๐Ÿš€ Production Ready: 1,068 tests, 100% coverage on new code, type-safe, minimal dependencies
  • โšก Minimal Overhead: ~0.5-2ms per span, smart sampling, efficient serialization

Quick Start

Get your API key at dashboard.prela.dev/api-keys, then:

pip install prela
export PRELA_API_KEY="prela_sk_..."

Add one line to your code:

import prela
from anthropic import Anthropic

# Initialize Prela - auto-instruments all LLM SDKs
# Traces are sent to Prela Cloud automatically
prela.init(service_name="my-agent")

# Use your LLM SDK normally - tracing happens automatically!
client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Spans are automatically created, traced, and exported โœจ

That's it! All your LLM calls are now traced with:

  • Request/response attributes (model, tokens, latency)
  • Tool calls and function invocations
  • Error capture and stack traces
  • Parent-child span relationships

Architecture Overview

graph TB
    subgraph "Your Application"
        A[Your Code] -->|calls| B[OpenAI SDK]
        A -->|calls| C[Anthropic SDK]
        A -->|calls| D[LangChain]
    end

    subgraph "Prela SDK"
        B -->|auto-instrumented| E[Tracer]
        C -->|auto-instrumented| E
        D -->|auto-instrumented| E
        E -->|creates| F[Spans]
        F -->|exports to| G[Exporter]
    end

    subgraph "Outputs"
        G -->|sends via PRELA_API_KEY| H[Prela Cloud]
        G -->|writes| I[Console]
        G -->|writes| J[Files]
    end

    style E fill:#4F46E5
    style F fill:#6366F1
    style G fill:#818CF8
    style H fill:#10B981

Key Features

๐ŸŽฏ Auto-Instrumentation

Works out of the box with 10 popular frameworks:

import prela
import openai

prela.init()

# Automatically traced
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
import prela
from anthropic import Anthropic

prela.init()

# Automatically traced
client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}]
)
import prela
from langchain.agents import initialize_agent

prela.init()

# Automatically traced
agent = initialize_agent(tools, llm, agent="zero-shot-react-description")
result = agent.run("What is the weather?")
import prela
from crewai import Agent, Task, Crew

prela.init()

# Automatically traced - task delegation, agent collaboration
crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
result = crew.kickoff()
import prela
from autogen import ConversableAgent

prela.init()

# Automatically traced - conversations, message flow
user_proxy.initiate_chat(assistant, message="Hello!")
import prela
from langgraph.graph import StateGraph

prela.init()

# Automatically traced - state changes, node execution
graph = StateGraph(state_schema={...})
compiled = graph.compile()
result = compiled.invoke(state)
import prela

# Start webhook receiver for n8n workflows (multi-tenant support)
prela.init(n8n_webhook_port=8787)

# Configure n8n to send webhooks to http://localhost:8787
# - All workflow executions, AI nodes, and tool calls automatically traced
# - Multi-tenant: Monitor multiple n8n instances with project isolation
# - Real-time dashboard with WebSocket updates (<1s latency)
# - Webhook URL routing: ?project=prod-n8n or header X-Prela-Project

๐Ÿ“ˆ Complete Span Data

Every span captures:

  • Timing: Start time, end time, duration
  • Attributes: Model, temperature, max tokens, token usage
  • Events: Request sent, response received, tool calls
  • Errors: Exception type, message, stack trace
  • Context: Trace ID, span ID, parent span ID

๐Ÿค– Multi-Agent Observability

First-class support for multi-agent systems:

CrewAI - Task-based orchestration: - Crew executions with agent collaboration - Task delegation tracking (assigner โ†’ assignee) - Tool usage per agent - Sequential vs hierarchical process modes

AutoGen - Conversational agents: - Multi-turn conversation tracking - Message flow (speaker โ†’ recipient) - Function calling with agent context - Turn-by-turn conversation history

LangGraph - Stateful workflows: - Graph execution with node tracing - State change detection (what each node modified) - Streaming support with step counting - Support for prebuilt agents (create_react_agent)

Swarm - Agent handoffs: - Handoff tracking (initial โ†’ final agent) - Context variable flow (privacy-safe keys) - Deterministic agent IDs - Execution isolation for concurrent runs

See Multi-Agent Integrations for detailed documentation.

๐Ÿงช Evaluation Framework

Test your agents systematically:

from prela.evals import EvalSuite, EvalCase, EvalInput, EvalExpected
from prela.evals import EvalRunner
from prela.evals.reporters import ConsoleReporter

# Define test cases
suite = EvalSuite(
    name="Customer Support Bot",
    cases=[
        EvalCase(
            id="test_greeting",
            name="Responds to greeting",
            input=EvalInput(query="Hello!"),
            expected=EvalExpected(contains=["Hi", "Hello"]),
        ),
    ]
)

# Run evaluation
runner = EvalRunner(suite, agent_function=my_agent)
result = runner.run()

# Report results
ConsoleReporter(verbose=True).report(result)

๐Ÿ”„ Deterministic Replay

Re-execute captured traces with modifications for testing and experimentation:

from prela.replay import ReplayEngine, TraceLoader

# Load captured trace
trace = TraceLoader.from_file("trace.json")

# Replay with different model
engine = ReplayEngine(trace)
result = engine.replay_with_modifications(
    model="gpt-4o",           # Try different model
    temperature=0.7,          # Adjust parameters
    enable_tools=True,        # Re-execute tool calls
    enable_retrieval=True     # Re-execute vector searches
)

# Compare original vs replayed execution
comparison = result.compare_with_original()
print(f"Output similarity: {comparison.semantic_similarity:.2%}")
print(f"Token difference: {comparison.token_difference}")

Replay capabilities: - Switch between models (OpenAI โ†” Anthropic) - Adjust temperature, max tokens, and other parameters - Re-execute tool calls with allowlist/blocklist controls - Re-execute retrieval operations with vector DB support - Automatic retry logic with exponential backoff - Semantic similarity comparison with fallback methods - Side-by-side diff visualization

See Replay Concepts and Replay Examples for detailed documentation.

๐Ÿ“Š Multiple Export Formats

Choose your output:

  • Prela Cloud: Full dashboard, search, and analysis โ€” set PRELA_API_KEY
  • Console: Pretty-printed JSON for development
  • File: JSONL format for local archiving (with rotation)
  • OTLP: OpenTelemetry protocol for third-party backends

What Gets Traced?

sequenceDiagram
    participant App as Your App
    participant Prela as Prela SDK
    participant LLM as LLM API
    participant Export as Exporter

    App->>Prela: prela.init()
    Note over Prela: Auto-instruments SDKs

    App->>LLM: client.messages.create(...)
    Note over Prela: Intercepts call
    Prela->>Prela: Create span
    Prela->>Prela: Record request attributes
    Prela->>LLM: Forward request
    LLM->>Prela: Return response
    Prela->>Prela: Record response attributes
    Prela->>Prela: Calculate duration
    Prela->>Export: Export span
    Prela->>App: Return response

Captured automatically:

  • Model name and provider
  • Input/output token counts
  • Request parameters (temperature, max_tokens, etc.)
  • Response metadata (finish_reason, stop_reason)
  • Tool/function calls with arguments
  • Streaming metrics (time-to-first-token)
  • Errors and exceptions

Use Cases

๐Ÿ› Debugging

Find out why your agent is misbehaving:

# List recent traces
prela list

# Show detailed trace
prela show trace_abc123

# Search for errors
prela search --status error

๐Ÿ“Š Monitoring

Track performance in production:

prela.init(
    service_name="production-agent",
    exporter="file",
    directory="./traces",
    sample_rate=0.1  # Sample 10% of traces
)

โœ… Testing

Validate agent behavior:

# eval_suite.yaml
name: Agent Test Suite
cases:
  - id: test_1
    name: Booking flow
    input:
      query: "Book a flight to NYC"
    expected:
      contains:
        - "flight"
        - "New York"
prela eval run eval_suite.yaml --reporter junit --output results.xml

๐Ÿ”ฌ Analysis

Export traces for analysis:

from prela.evals.reporters import JSONReporter

# Export to JSON for analysis
JSONReporter("analysis/results.json").report(eval_result)

Documentation

Getting Started

Concepts

Integrations

LLM Providers: - OpenAI - GPT-3.5, GPT-4, embeddings - Anthropic - Claude models

Agent Frameworks: - LangChain - Chains and agents - LlamaIndex - Query engines

Multi-Agent Frameworks: - CrewAI - Task-based orchestration - AutoGen - Conversational agents - LangGraph - Stateful workflows - Swarm - Agent handoffs

Workflow Automation: - n8n - Webhook-based workflow tracing - n8n Code Nodes - Python code instrumentation

Evaluation

Replay


Community & Support


What's Next?