Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.catenatelematics.com/llms.txt

Use this file to discover all available pages before exploring further.

Monitoring webhook health is critical for maintaining reliable integrations. Catena provides comprehensive metrics, detailed logs, and replay capabilities to help you track delivery performance and recover from failures.
Proactive Monitoring: Track webhook success rates and response times to identify issues before they impact your integration.

Delivery Metrics

The Notifications API provides real-time and historical metrics to monitor webhook performance and reliability. Metrics help you understand delivery success rates, response times, and identify performance issues before they impact your integration. Use metrics across multiple time windows to track both immediate issues and long-term trends. Monitor key indicators like success rate, response time, and DLQ count to maintain healthy webhook integrations.
Circuit Breaker: Webhooks are automatically marked as stale and paused when delivery performance drops below acceptable thresholds. This prevents resource waste and alerts you to integration issues requiring attention.

Delivery Logs

Access detailed logs for every webhook delivery attempt through the Notifications API. Logs provide complete visibility into delivery behavior, including timestamps, status codes, error messages, and response times. Use logs to debug failed deliveries by identifying patterns across event types, time periods, or error messages. Logs are essential for root cause analysis when troubleshooting webhook reliability issues.
Debugging Strategy: Filter logs by event type, time range, or status code to quickly isolate problematic patterns and identify the root cause of delivery failures.

Event Replay

Recover from delivery failures by replaying events from the Dead Letter Queue (DLQ). When events fail all automatic retry attempts, they’re stored in the DLQ for up to 14 days, giving you time to fix issues and replay them. The replay functionality redelivers all DLQ events for a webhook subscription, allowing you to recover from temporary outages, application bugs, or configuration issues without losing data.

Common Replay Scenarios

Endpoint Downtime

Recover events lost during maintenance windows or infrastructure outages.

Application Errors

Reprocess events after fixing bugs in your webhook handler.

Configuration Issues

Replay events after correcting webhook URL or authentication problems.

Data Recovery

Reprocess historical events after resolving integration issues.

How to Replay Events

1

Identify Failed Events

Use metrics and logs to determine which events are in the DLQ and need replay.
2

Fix the Root Cause

Resolve the underlying issue that caused delivery failures before replaying.
3

Initiate Replay

Call the replay endpoint to redeliver all DLQ events to your webhook.
4

Monitor Redelivery

Watch logs and metrics to verify replayed events are successfully delivered.
14-Day Retention: Events are permanently deleted from the DLQ after 14 days. Replay critical events before they expire to avoid data loss.

Webhook Lifecycle Management

Catena automatically manages webhook health to maintain reliable delivery and alert you to integration problems.

Circuit Breaker

Webhooks are automatically paused when delivery performance drops below acceptable thresholds. This prevents wasted resources and signals that immediate attention is needed. When a webhook is deactivated:
  • Event delivery is paused
  • New events are queued but not delivered
  • Webhook status changes to stale

Reactivation

After identifying and fixing the root cause of delivery failures, reactivate the webhook to resume event delivery:
1

Investigate the Issue

Review metrics and logs to understand why delivery performance degraded.
2

Resolve Problems

Fix endpoint issues, application bugs, or configuration errors.
3

Reactivate Webhook

Update the webhook to change its status back to active and resume delivery.
4

Monitor Recovery

Track metrics closely to ensure performance improves and remains healthy.

Monitoring Best Practices

Set Up Alerts

Configure automated alerts for success rate and response time degradation to catch issues early.

Track Long-Term Trends

Review metrics across multiple time windows to identify patterns and seasonal variations.

Optimize Performance

Keep response times low by processing webhooks asynchronously and returning acknowledgments quickly.

Monitor the DLQ

Regularly check for events in the Dead Letter Queue and replay them before the retention period expires.

Analyze Failure Patterns

Use delivery logs to identify recurring issues and address root causes systematically.

Validate Configuration

Periodically verify webhook URLs, filters, and secrets remain correct and up to date.

Troubleshooting

Common Causes:
  • Endpoint responding too slowly or timing out
  • Application errors causing failed responses
  • Network connectivity or infrastructure issues
  • Insufficient server resources to handle load
Solutions:
  • Optimize endpoint performance with asynchronous processing
  • Fix application bugs and handle errors gracefully
  • Scale infrastructure to accommodate webhook volume
  • Review logs to identify specific error patterns
Common Causes:
  • Extended endpoint downtime or outages
  • Persistent application errors
  • Misconfigured webhook URL or authentication
Solutions:
  • Verify endpoint accessibility and correct configuration
  • Fix application issues preventing successful processing
  • Replay DLQ events after resolving the root cause
Common Causes:
  • Synchronous processing in the webhook handler
  • Database operations or external API calls in the request path
  • Insufficient server resources
Solutions:
  • Return acknowledgment immediately and process asynchronously
  • Move heavy operations to background jobs or queues
  • Optimize database queries and reduce blocking operations
Common Causes:
  • Delivery performance dropped below circuit breaker thresholds
  • Persistent endpoint unavailability or misconfiguration
Solutions:
  • Review metrics to identify when and why performance degraded
  • Check logs for error patterns and failure modes
  • Fix the underlying issues before reactivating
  • Monitor closely after reactivation to ensure sustained health