Allocation & Usage Enforcement

GhostPour uses dollar-based allocation instead of request counts. Each tier has a monthly cost limit, and every LLM request's cost is tracked against it.

Per-Request Enforcement

Every POST /v1/chat request passes through this pipeline:

Resolve user from JWT (tier is always read from DB, never from JWT)
Auto model routing — provider: "auto", model: "auto" resolves to the tier's default model
Model access check — is the provider/model allowed for this tier? Are images within limit?
Rate limit — in-memory token bucket per user, tier.requests_per_minute
Quota check — monthly_used_usd vs monthly_cost_limit_usd
Feature gating — Context Quilt and other feature checks
Route to LLM — send to upstream provider
Calculate cost — LiteLLM pricing with cached token discount
Record cost — increment monthly_used_usd
Return response with allocation headers

Response Headers

Every chat response includes allocation state:

X-Allocation-Percent: "45.3"      // % of monthly limit used
X-Allocation-Warning: "true"      // present when >= 80%
X-Monthly-Used: "1.1325"          // dollars used this period
X-Monthly-Limit: "2.50"           // dollars allowed this period

Your iOS app can use these to show a usage bar or warning indicator without making a separate API call.

Exhausted Response

When allocation runs out, the response is a structured 429:

{
  "detail": {
    "code": "allocation_exhausted",
    "message": "Monthly allocation exhausted ($2.50/$2.50). Upgrade your plan for more hours.",
    "details": {
      "monthly_used": 2.5,
      "monthly_limit": 2.5,
      "fallback": "on_device"
    }
  }
}

Usage Reporting

GET /v1/usage/me returns everything your iOS app needs:

{
  "tier": "pro",
  "tier_display_name": "Pro",
  "allocation": {
    "monthly_limit_usd": 2.50,
    "monthly_used_usd": 0.4567,
    "monthly_remaining_usd": 2.0433,
    "percent_used": 18.3,
    "resets_at": "2026-04-23T15:30:00+00:00"
  },
  "hours": {
    "used": 9.1,
    "limit": 50.0,
    "remaining": 40.9
  },
  "this_month": {
    "requests": 42,
    "input_tokens": 15234,
    "output_tokens": 8901,
    "cached_tokens": 1200,
    "cost_usd": 0.4567
  },
  "summary_mode": "delta",
  "summary_interval_minutes": 10,
  "max_images_per_request": 2,
  "features": { "context_quilt": "enabled" },
  "is_trial": false,
  "trial_end": null
}

Hours conversion: hours = cost / model_cost_per_hour (Haiku = $0.05/hr, Sonnet = $0.19/hr). This gives users an intuitive unit instead of raw dollars.

Tier-Locked Settings

When your iOS app uses GhostPour as the provider, these settings are server-controlled:

Setting	Bring-Your-Own-Key	GhostPour Managed
Model selection	User choice	Locked: "auto" (server picks)
Auto-summary interval	User choice (2-15 min)	Locked per tier: 10 or 15 min
Summary mode	User choice	Locked per tier: "delta" or "choice"
Max images per query	5	Locked per tier: 1-5
Context Quilt	User choice	Tier-gated: enabled/teaser/off

The iOS app reads these from the /v1/usage/me response and locks the corresponding UI controls.

Why summary intervals vary by tier

Auto-summaries are full chat requests that consume allocation. Sonnet is ~4x more expensive per token than Haiku, so Sonnet tiers default to 15-minute intervals instead of 10 to prevent users from burning through hours unexpectedly.

On-Device Fallback

When allocation is exhausted, offer a graceful degradation path:

This keeps your app functional even when the user has used all their cloud hours.

Future: Overage Credits

The database infrastructure for overage credit purchases exists but is not yet active:

monthly_allocation → overage_balance → 429 fallback to on-device

The overage_balance_usd column and response fields are in place. When ready, add a StoreKit consumable product for credit packs and wire up the deduction logic.

Per-Request Enforcement​

Response Headers​

Exhausted Response​

Usage Reporting​

Tier-Locked Settings​

On-Device Fallback​

Future: Overage Credits​