Rate limits & fair use
Slothbox is built so that everyday work — a person at the keyboard, a handful of boxes, a light script on the side — just works, without you ever thinking about quotas. Limits exist only to keep one runaway caller from crowding out everyone else, and to draw a clear line between interactive use and automation that runs at scale.
This page is the plain-English framing: who gets what, and why. For the concrete
per-caller numbers, the 429 response, and the Retry-After header, see
Rate limiting.
Interactive surfaces are paced for people
The web app, the CLI, and the MCP server are all driven by a human signing in, and they all authenticate with a session token rather than a service-account key. Because a person is at the controls, these surfaces get a generous, human-paced rate limit — comfortably above what hands-on work ever needs. You'd have to be running a tight loop to notice it. So you can browse, drive the CLI, and let an editor's MCP integration call as you work without watching a budget.
The same applies to the personal API key that comes with a seat: it's there for light, occasional scripting alongside your interactive work, so it shares the gentle, low-throughput posture rather than the high-volume tier built for unattended automation. The two key types are described on Authentication.
A ceiling on concurrently active boxes per seat
Alongside request rate, there's a separate guardrail on how many boxes a seat can have running at the same time. This is a fairness and safety measure: a seat is sized for a person's hands-on work, and a person works with a small, bounded set of live boxes at once. The cap is generous for that, and it isn't a cap on how many boxes you can create over time or keep stopped — only on how many are concurrently active.
If you ask to start or launch a box beyond that ceiling, the API turns the request down rather than spinning up the box; see Errors for the exact response. The fix is either to stop a box you're no longer using, or — if you genuinely need many boxes live at once — to move that workload onto the API plan, where this ceiling is raised or removed. The seat ceiling is about interactive head-room, not about capping automation.
Automation at scale belongs on the API plan
The dividing line across all of this is the fair-use rule at the heart of the plans: interactive, human-driven use lives on a seat; unattended automation that runs at scale belongs on the API plan. A developer reaching for the CLI or an editor's MCP integration is using their seat exactly as intended. A CI pipeline, an agent loop, or a backend service that runs on its own — fanning out boxes, calling the API in volume, with no person watching — is automation, and that's what the API plan is for.
Putting that workload on the API plan is what lifts you onto the higher throughput tier and raises or removes the per-seat concurrent-box ceiling, so automation isn't squeezed by limits sized for one person's hands-on session. What the plan unlocks, and how an owner turns it on, are covered on API plan; the headless credential and setup path is on Headless authentication.
Long-lived keys for automation, session tokens for people
The same split runs through the credentials themselves, and it's deliberate:
- Service-account
sk_keys are long-lived. They stay valid until you revoke them — no browser to redirect to, no token to refresh — which is exactly what an unattended process needs. These are the org-level keys you embed in CI, in agents, and in backend services on the API plan. - OAuth / JWT sessions are short-lived. The tokens behind the web app, the CLI, and the MCP server are session- and TTL-scoped: they expire on a fixed window and are refreshed silently while a person is signed in. That's right for interactive use and unsuitable for anything that runs on its own.
So the credential a caller holds lines up with which side of the boundary it sits on: a long-lived key for automation, a refreshing session for a person at the keyboard. The full credential model is on Authentication.
Staying within the limits
- Read
Retry-After, don't hard-code a delay. When you do hit a limit the API tells you how long to wait. Rate limiting shows the429response and a small back-off helper; let it drive your retries. - Stop boxes you're done with. Concurrent-box head-room frees up as soon as you stop a box, so an interactive session rarely needs to come close to the ceiling.
- Move scaled automation to the API plan. If a workload is unattended and growing, the plan is the supported home for it — higher throughput and a lifted box ceiling — rather than pushing a seat past what it's paced for.