Skip to main content

Errors, retries & rate limits

Every non-2xx response from the API throws a typed error, every error carries the machine-readable error.code when the API sent one, and 429s come with a Retry-After you should respect. This page is how to handle all of it like production code.

The error hierarchy

All SDK errors extend SlothboxError. HTTP errors are an APIError (or a status-specific subclass) exposing status, code, message, and requestId — the gateway request id, worth logging for support. Requests that never produced a response (DNS, TLS, resets) throw APIConnectionError.

The hierarchy is identical in both SDKs — same class names, same semantics. Only the attribute casing differs: Python exposes request_id, retry_after, and retry_context where TypeScript has requestId, retryAfter, and retryContext.

ClassStatusNotes
BadRequestError400Carries issues with field-level validation details.
AuthenticationError401Missing, mistyped, or revoked key.
PlanRequiredError402The operation needs an active API plan.
PermissionDeniedError403Authenticated, but the caller lacks the required role.
NotFoundError404Doesn't exist, or isn't visible to this caller.
ConflictError409The request conflicts with current state — discriminate on code.
RateLimitError429Exposes retryAfter (whole seconds, from Retry-After).
APIErroranyBase class; also thrown directly for anything unmapped (e.g. 500).
APIConnectionErrorNo HTTP response at all; the underlying error is on cause.

The classes tell you what kind of failure; the code tells you which one.

The error.code taxonomy

The API attaches a stable, machine-readable code to the error envelope for the failure modes an integration must branch on. The SDK surfaces it as error.code:

codeThrown asMeaningWhat to do
seat_ceiling_exceededConflictErrorThe seat's concurrently-active-box ceiling is full.Stop a box you're done with, or move scaled fan-out onto the API plan. Don't retry blindly — the launch will keep failing until headroom frees up.
no_active_aws_connectionConflictErrorThe org has no active AWS connection to launch into.An owner needs to connect (or fix) the org's AWS account; retrying won't help.
template_not_bakedConflictErrorThe template's bundle isn't ready yet.Wait for the bake to finish (waitUntilBaked), or rebake a bundle_failed template, then launch.
environment_terminatedConflictErrorLifecycle call on a box that's terminated.Terminal — launch a new box instead.
environment_launchingConflictErrorLifecycle call on a box still launching.Wait until it's running (waiters), then retry the call.
api_plan_requiredPlanRequiredErrorThe operation needs the org to be on the API plan.An owner can turn it on — see API plan.
api_plan_lapsedPlanRequiredErrorThe org's API plan has lapsed, so service-account keys are paused.Restore the subscription; the keys resume without re-minting.
rate_limitedRateLimitErrorToo many requests — see rate limits below.Wait retryAfter seconds, then retry.

Two forward-compatibility rules:

  • The set of codes grows. Treat unknown codes as their class (an unrecognised 409 code is still a ConflictError) rather than erroring.
  • code can be absent. Routes the error-code rollout hasn't reached yet return the envelope without a code — fall back to status and message.

Putting it together:

import { ConflictError, RateLimitError } from "@slothbox/sdk";

try {
await slothbox.environments.launch(
{ orgId, body: { templateId } },
{ idempotencyKey },
);
} catch (err) {
if (err instanceof ConflictError && err.code === "seat_ceiling_exceeded") {
// Stop fanning out — free a box or move this workload to the API plan.
} else if (err instanceof RateLimitError) {
// The SDK's built-in retries (below) have already been exhausted by the
// time you see this. Back off for longer, then retry — reusing the SAME
// idempotency key, so a launch that actually went through isn't repeated.
await sleep((err.retryAfter ?? 1) * 1000);
} else {
throw err; // log err.requestId when reporting persistent failures
}
}

Retries

Both clients retry transient failures by default, with the same deliberately conservative policy:

  • GET/HEAD/PUT/DELETE are retried on 429s, 5xx responses, and network errors — up to 3 retries by default, with capped, full-jitter exponential backoff (base 500 ms, ceiling 30 s).
  • Retry-After always wins. When a 429 says how long to wait, that wait is honoured exactly instead of the computed backoff.
  • POST and PATCH are never blind-retried. A POST that timed out may still have gone through, and a duplicated launch provisions a second EC2 box on your AWS bill — so a POST is only retried when it carries an Idempotency-Key (as environments.launch does when you pass idempotencyKey / idempotency_key, and launchAndWait / launch_and_wait always do).
  • The retry budget is configurablenew Slothbox({ maxRetries }) / Slothbox(max_retries=…) for the client, and per request ({ maxRetries } in TypeScript, RequestOptions(max_retries=…) in Python); 0 disables retries. See Configuration for where these options live.
  • Exhausted retries throw the normal typed error with a retry context attached (retryContext / retry_context) — the attempts made, the total time slept, and the last Retry-After the API sent — so your logs show what the SDK already tried.

Rate limits

Limits are tracked per caller, and the budgets differ by operation — the full table lives on Rate limiting, which is the authoritative page. The shape of it, from an SDK user's point of view:

  • Most calls share a generous per-key budget. Status polling, listing, reads — you'll only ever notice the limit from a tight loop.
  • The expensive operations have tight org-wide budgets. Launching boxes and baking templates share one small per-organization budget (per the limits table, a handful per minute and an hourly cap on top) — a launch fan-out hits that wall long before any per-key limit. The same goes for other org-scoped tiers like AWS connection checks and GitHub-backed calls. Spread launches out, reuse boxes instead of re-launching, and treat a 429 on launch as backpressure, not failure.
  • Read Retry-After, don't hard-code a delay. The SDKs parse it onto RateLimitError.retryAfter / .retry_after (whole seconds) and the retry middleware honours it automatically.

A 429 is a normal operating condition for automation at scale — handled with backoff (and the org-tier realities above), it costs you latency, not correctness.