Monitoring
How OpsKitty monitors your websites, handles failures, and scales across regions.
Check Intervals
Each monitored endpoint has a configurable check interval that determines how frequently OpsKitty verifies its status. The minimum allowed interval depends on your plan.
| Plan | Minimum Interval | Check Mode | Regions |
|---|---|---|---|
| Free | 30 minutes | Local | — |
| Launch | 1 minute | Local | — |
| Growth | 30 seconds | Multi-region | 3 regions |
| Pro | 20 seconds | Multi-region | 11 regions |
| Scale | 15 seconds | Multi-region | 29 regions |
A global minimum of 30 seconds is enforced regardless of configuration to prevent excessive requests to monitored targets.
Success Criteria
Each check evaluates conditions to determine if an endpoint is healthy.
Default Behavior
When no custom conditions are configured, OpsKitty applies sensible defaults:
- HTTP status code must be less than
400(i.e., 2xx or 3xx responses are considered healthy) - The connection must be established successfully
Custom Conditions
You can configure custom conditions using placeholders:
| Placeholder | Description | Example |
|---|---|---|
| [STATUS] | HTTP status code | [STATUS] == 200 |
| [BODY] | Response body text | [BODY] contains "ok" |
| [RESPONSE_TIME] | Response time in ms | [RESPONSE_TIME] < 2000 |
| [CONNECTED] | Connection established | [CONNECTED] == true |
| [CERTIFICATE_EXPIRATION] | TLS cert expiry (ms) | [CERTIFICATE_EXPIRATION] > 604800000 |
All conditions use AND logic — every condition must pass for the check to be considered successful.
Failure Handling & Backoff
When an endpoint fails consecutively, OpsKitty applies exponential backoff to reduce load on the target and avoid being blocked by firewalls or WAFs.
How It Works
- First 3 failures: check continues at normal interval
- After 3 failures: interval doubles with each additional failure
- Maximum backoff: 15 minutes (900 seconds)
- On recovery (successful check): interval resets to normal immediately
Backoff Progression
Example with a base interval of 60 seconds:
| Consecutive Failures | Next Check In | Multiplier |
|---|---|---|
| 1–3 | 60s | 1x (normal) |
| 4 | 120s | 2x |
| 5 | 240s | 4x |
| 6 | 480s | 8x |
| 7 | 900s | 15 min cap |
| 8+ | 900s | 15 min cap |
This prevents your monitoring from being flagged as abusive traffic while still continuing to check whether the endpoint recovers.
Multi-Region Monitoring
For Growth, Pro, and Scale plans, OpsKitty checks your endpoints from multiple AWS regions to detect regional outages and provide global uptime visibility.
Region Rotation
For plans with many regions (e.g., Scale with 29 regions), OpsKitty uses region rotation instead of hitting all regions simultaneously:
- Each check cycle uses a subset of 5 regions
- Regions rotate deterministically so all are covered over multiple cycles
- Full global coverage is achieved over
ceil(total_regions / 5)cycles - This prevents the target from seeing simultaneous requests from 29 IPs
Coverage Example
For a Scale plan with 29 regions and 60s interval:
| Cycle | Regions Checked | Cumulative Coverage |
|---|---|---|
| 1 (0s) | 5 regions | ~17% |
| 2 (60s) | 5 regions | ~34% |
| 3 (120s) | 5 regions | ~52% |
| 4 (180s) | 5 regions | ~69% |
| 5 (240s) | 5 regions | ~86% |
| 6 (300s) | 4 regions | 100% |
Status Aggregation
When results come back from multiple regions, OpsKitty uses an "any success" strategy:
- If any region returns a successful check, the endpoint is marked UP
- Only if all regions fail is the endpoint marked DOWN
- This avoids false alarms from isolated regional issues
Status Changes & Alerts
OpsKitty tracks status transitions and can trigger alerts when an endpoint goes down or recovers.
- Status changes (UP → DOWN or DOWN → UP) are recorded as events
- Each event includes a timestamp and duration of the previous state
- Alerts fire on status change based on your configured alert rules
- Backoff applies to monitoring frequency but not to alert delivery
Supported Protocols
OpsKitty supports monitoring endpoints using multiple protocols: