Guide · Reliability · 2026-04-22 · 7 min read

Errors & rate limits.

An integration that retries forever is worse than one that gives up cleanly. This guide is about building the retry loop that does the right thing on every HTTP status Trooply can return.

The error envelope

Every 4xx and 5xx response from Trooply is JSON with a consistent shape:

{
  "error":      "rate_limited",
  "message":    "Request rate exceeded. Retry after 12 seconds.",
  "request_id": "a8f2b8715a01"
}

Three fields to act on:

error — a short snake_case code. Use this in your error-handling logic, not the HTTP status alone.
message — human-readable, safe to log. Not safe to surface to shoppers verbatim — it may mention internal hints.
request_id — also in the X-Request-ID response header. Quote this in any support email; it's how we find your call in our logs.

The full enumeration of error codes lives at /api-errors.

What each status actually means

Status	Meaning	Retry?
200 / 201 / 204	Success.	n/a
400	Malformed request body or query. JSON parse error, wrong field types.	No. Retrying won't change the outcome.
401	Missing or invalid credentials.	Only if you can refresh — see Authentication.
403	Credentials valid but missing scope, or widget Origin not allow-listed.	No. The client is wrong.
404	Resource doesn't exist. Wrong `product_id`, unknown job_id, etc.	No.
409	Idempotency conflict — same idempotency key with a different body.	No. Use a fresh key.
413	Request body too large (typically >10 MB on upload endpoints).	No.
410	Endpoint deprecated and removed (most commonly the pre-§0.6 `/v1/users/*` routes). The body's `replacement` field names the new path.	No. Switch your call site.
422	Schema validation failed. Pydantic enumerates the offending field in `detail`.	No.
429	Rate limit exceeded — see the next section for two distinct flavours.	Yes, respecting `Retry-After`.
500	Server bug. Retry once; escalate if it persists.	Yes, once.
502 / 503 / 504	Upstream or overload. Transient.	Yes, with backoff.

Auth errors with extra branches (PLAN.md §0.6)

A handful of 401s carry error codes that mean something more specific than "credentials are wrong" — your client should branch on these rather than treat every 401 the same:

`error`	Status	Meaning	Right response
`requires_2fa`	401	Email + password were correct, but the user has 2FA enabled. Returned by `/oauth/login` when `totp_code` is missing from the body.	Prompt for the 6-digit authenticator code, retry the same login with `totp_code` populated. Don't show "wrong password" — the password was fine.
`invalid_2fa`	401	The supplied `totp_code` didn't verify.	Re-prompt with a fresh code (codes refresh every 30 seconds). Counts toward the per-identity throttle below.
`too_many_requests`	429	Per-identity throttle: 5 failed login attempts per email (or 10 per `client_id`) within a 15-minute window. Distinct from the regular per-IP rate limit. Resets on a successful auth.	Back off for `details.retry_after` seconds. A botnet rotating IPs hits this same bucket — don't keep retrying.
generic `unauthorized`	401	Wrong password, deactivated account, or no membership. Intentionally indistinguishable so you can't enumerate users by response shape.	Show "Invalid email or password" to the user. If you're sure the credentials are right, check the per-identity throttle hasn't tripped.

Rate limits

Trooply rate-limits per-account and per-endpoint-family. The exact numbers depend on your plan — see /pricing. The mechanics are the same across tiers:

Token bucket, refilled continuously. You get bursts up to your bucket size and then steady-state limiting once drained.
Separate budgets for search calls (high-throughput, hot path) and write calls (indexing, config). A slow catalog sync doesn't eat into your search budget.
Public keys have their own per-key budget — a compromised widget key can't burn your server-side search budget.
Multi-store isolation (§0.6). When the JWT is for a user with memberships across many stores, the bucket key includes the active store (resolved from the URL slug or the X-Trooply-Active-Tenant header). Acting as Store A doesn't drain Store B's budget.

The per-identity throttle on /oauth/login + /oauth/token

The plan-tier rate limit above is keyed by IP. That blocks the dumb single-IP brute-force, but a botnet of 1000 IPs each trying once would defeat it. So /oauth/login and /oauth/token have a second bucket keyed on the identity being attacked, not the source IP:

5 failed logins per email per 15 minutes — exceeds, every subsequent attempt returns 429 too_many_requests until the window rolls. A correct password hitting a tripped bucket also fails — that's the point.
10 failed client_credentials grants per client_id per 15 minutes — same pattern for machine integrations.
Successful auth wipes the counter. A user who backspaced their password earlier doesn't carry strikes forward.

If your client treats every 401 as "retry with a different password", you'll burn the bucket and lock yourself out. Honour details.retry_after on a 429 with error: too_many_requests — the timer is per-account, not per-IP, so spinning up a new client doesn't help.

The response headers you care about

Every successful call carries these:

X-RateLimit-Limit:     200     // your per-minute budget
X-RateLimit-Remaining: 187     // left in the current window
X-RateLimit-Reset:     47      // seconds until the window rolls

A 429 response also sets Retry-After (in seconds) — wait at least that long. Don't retry sooner, even if the budget looks like it should be available; the server's view is authoritative.

Retry logic that actually works

Retries are deceptively tricky. Done wrong they DDoS your own upstream. The rules that keep them sane:

Only retry safe statuses. 429 and 5xx. Never 4xx — 4xx is a bug in your code, not a transient condition.
Respect Retry-After. If the server says 12 seconds, wait 12 seconds. Hard-coding your own delay ignores the server's load signal.
Exponential backoff with jitter. For 5xx without a Retry-After, start at ~1 second, double each retry, randomise ±20% so simultaneous failures don't sync up.
Cap retries. Three attempts is usually right. After three 5xxs, give up and log — something is wrong on our end and you'll hear about it via status.trooply.ai.
Fail the caller fast. For a shopper-facing search, a 5xx→retry→5xx→retry→5xx loop adds multiple seconds of latency before the shopper sees anything. Better to return an empty result set in 200ms and let them refresh.

# Minimal retry wrapper (Python, httpx)
import httpx, random, time

def call(client, method, url, max_attempts=3, **kw):
    for attempt in range(max_attempts):
        r = client.request(method, url, **kw)
        if r.status_code < 500 and r.status_code != 429:
            return r  # success or non-retriable 4xx
        # Server-dictated wait if present, else exponential + jitter.
        retry_after = r.headers.get("retry-after")
        if retry_after:
            time.sleep(float(retry_after))
        else:
            time.sleep((2 ** attempt) + random.random() * 0.4)
    return r  # last response, caller decides

Anti-pattern

The classic "infinite retry loop until success" fails in exactly the case where you need restraint most — a persistent 5xx from a real incident. Your workers pile up retries, their memory grows, eventually the queue consumer falls over, and recovery takes a human. Always cap attempts, always fail the outer call.

Idempotency on writes

Writes to Trooply are idempotent by default on the natural key (product_id for products, rule_id for merchandising rules, etc.) — repeating the same POST with the same ID is safe, it upserts.

For operations that don't have a natural key (bulk jobs, search feedback events), pass an idempotency key in the Idempotency-Key header:

POST /v1/products/bulk
Idempotency-Key: sync-2026-04-22-run-7

{"products": [...]}

Replays with the same key within 24 hours return the original response without re-doing the work. Replays with the same key but a different body return 409 — that's the "you're about to lie about a previous request" guardrail.

Generate idempotency keys from a combination of the logical operation and a stable run identifier. Don't use uuid4() — that defeats the purpose on the retry path.

What 422 tells you

Unlike a generic 400, a 422 carries structured validation detail:

{
  "error":   "validation_error",
  "message": "Request body failed validation.",
  "detail": [
    {
      "loc":  ["body", "price"],
      "msg":  "Input should be a valid number",
      "type": "decimal_parsing"
    }
  ]
}

The detail array is FastAPI's standard Pydantic error format — if you're using an OpenAPI-generated SDK, your client library probably already maps it into a typed exception. If you're hand-rolling, the loc path tells you exactly which field the shopper or upstream system handed you in a bad shape.

Observability

Every response includes X-Request-ID. Log it, always, alongside your own trace ID. When something weird happens you can quote the request ID in a support email and we'll pull the full server-side trace — log entries, middleware timings, retry context. Without it, we're guessing.

One more rule

When in doubt, log the message field and the status. Don't silently swallow errors and don't convert "we couldn't tell" into "no results". A failing search that looks like "0 results" to the shopper is the worst kind of outage — no one notices until conversion metrics start dropping next week.

This is the last of the core guides. From here, the blog's feature deep-dives cover merchandising, promo banners, and the drop-in widget in context.