Alerting¶
Prela's alerting system monitors your agent metrics and sends notifications when thresholds are crossed. Alerts are evaluated continuously in the background and can notify via Slack, Email, or PagerDuty.
Supported Metrics¶
| Metric | Description | Example Use Case |
|---|---|---|
error_rate |
Percentage of spans with error status | Alert when error rate exceeds 5% |
latency_p95 |
95th percentile latency (ms) for LLM spans | Alert when P95 exceeds 3000ms |
latency_mean |
Average latency (ms) for LLM spans | Alert when mean latency spikes |
cost_per_trace |
Average cost per trace (USD) | Alert when costs exceed budget |
success_rate |
Percentage of successful spans | Alert when success drops below 95% |
token_usage |
Total tokens consumed | Alert on unusual token consumption |
Creating an Alert Rule¶
Alert rules are managed via the REST API:
curl -X POST https://api.prela.dev/api/v1/alerts/projects/{project_id}/rules \
-H "Authorization: Bearer $PRELA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "High Latency Alert",
"metric_type": "latency_p95",
"condition": "gt",
"threshold": 3000,
"evaluation_window_minutes": 60,
"severity": "high",
"notify_slack": true,
"slack_webhook_url": "https://hooks.slack.com/services/...",
"cooldown_minutes": 30
}'
Or use the Prela dashboard at Alerts in the navigation.
Alert Rule Configuration¶
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
Required | Human-readable alert name |
description |
str |
None |
Optional description |
enabled |
bool |
true |
Whether this rule is active |
metric_type |
str |
Required | One of the 6 supported metrics |
condition |
str |
Required | gt, lt, gte, or lte |
threshold |
float |
Required | Numeric threshold value |
evaluation_window_minutes |
int |
60 |
Time window to evaluate (5-1440 min) |
agent_name |
str |
None |
Filter to a specific agent |
severity |
str |
"medium" |
low, medium, high, critical |
cooldown_minutes |
int |
30 |
Minimum time between notifications (5-1440 min) |
Notification Channels¶
Slack¶
Send alerts to a Slack channel via webhook URL.
Slack messages include severity-colored headers, metric details, and a link to the dashboard.
Email¶
Send alerts to one or more email addresses. Requires SMTP configuration on the backend.
{
"notify_email": true,
"email_addresses": ["[email protected]", "[email protected]"]
}
PagerDuty¶
Trigger PagerDuty incidents using the Events API v2.
Alert severity is mapped to PagerDuty severity: critical -> critical, high -> error, medium -> warning, low -> info.
How Evaluation Works¶
- The alert evaluation loop runs as a background task in the API Gateway (configurable interval, default 5 minutes).
- For each enabled rule, the system queries ClickHouse for the metric value over the evaluation window.
- If the condition is met (e.g.,
latency_p95 > 3000), notifications are dispatched to all configured channels. - Cooldown deduplication prevents repeated notifications -- after an alert fires, it won't fire again until the cooldown period expires.
API Reference¶
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/alerts/projects/{project_id}/rules |
Create an alert rule |
GET |
/api/v1/alerts/projects/{project_id}/rules |
List alert rules |
GET |
/api/v1/alerts/projects/{project_id}/rules/{rule_id} |
Get a specific rule |
PUT |
/api/v1/alerts/projects/{project_id}/rules/{rule_id} |
Update a rule |
DELETE |
/api/v1/alerts/projects/{project_id}/rules/{rule_id} |
Delete a rule |
GET |
/api/v1/alerts/projects/{project_id}/triggered |
Get triggered alert history |
Example: Slack Alert for Error Rate¶
curl -X POST https://api.prela.dev/api/v1/alerts/projects/my-project/rules \
-H "Authorization: Bearer $PRELA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Error Rate Spike",
"metric_type": "error_rate",
"condition": "gt",
"threshold": 0.05,
"evaluation_window_minutes": 30,
"severity": "critical",
"notify_slack": true,
"slack_webhook_url": "https://hooks.slack.com/services/T00/B00/xxx",
"cooldown_minutes": 60
}'
This creates an alert that fires when the error rate exceeds 5% over a 30-minute window, sends a Slack notification, and won't fire again for at least 60 minutes.