Allocation & Usage Enforcement
GhostPour uses dollar-based allocation instead of request counts. Each tier has a monthly cost limit, and every LLM request's cost is tracked against it.
Per-Request Enforcement
Every POST /v1/chat request passes through this pipeline:
- Resolve user from JWT (tier is always read from DB, never from JWT)
- Auto model routing —
provider: "auto", model: "auto"resolves to the tier's default model - Model access check — is the provider/model allowed for this tier? Are images within limit?
- Rate limit — in-memory token bucket per user,
tier.requests_per_minute - Quota check —
monthly_used_usdvsmonthly_cost_limit_usd - Feature gating — Context Quilt and other feature checks
- Route to LLM — send to upstream provider
- Calculate cost — LiteLLM pricing with cached token discount
- Record cost — increment
monthly_used_usd - Return response with allocation headers
Response Headers
Every chat response includes allocation state:
X-Allocation-Percent: "45.3" // % of monthly limit used
X-Allocation-Warning: "true" // present when >= 80%
X-Monthly-Used: "1.1325" // dollars used this period
X-Monthly-Limit: "2.50" // dollars allowed this period
Your iOS app can use these to show a usage bar or warning indicator without making a separate API call.
Exhausted Response
When allocation runs out, the response is a structured 429:
{
"detail": {
"code": "allocation_exhausted",
"message": "Monthly allocation exhausted ($2.50/$2.50). Upgrade your plan for more hours.",
"details": {
"monthly_used": 2.5,
"monthly_limit": 2.5,
"fallback": "on_device"
}
}
}
Usage Reporting
GET /v1/usage/me returns everything your iOS app needs:
{
"tier": "pro",
"tier_display_name": "Pro",
"allocation": {
"monthly_limit_usd": 2.50,
"monthly_used_usd": 0.4567,
"monthly_remaining_usd": 2.0433,
"percent_used": 18.3,
"resets_at": "2026-04-23T15:30:00+00:00"
},
"hours": {
"used": 9.1,
"limit": 50.0,
"remaining": 40.9
},
"this_month": {
"requests": 42,
"input_tokens": 15234,
"output_tokens": 8901,
"cached_tokens": 1200,
"cost_usd": 0.4567
},
"summary_mode": "delta",
"summary_interval_minutes": 10,
"max_images_per_request": 2,
"features": { "context_quilt": "enabled" },
"is_trial": false,
"trial_end": null
}
Hours conversion: hours = cost / model_cost_per_hour (Haiku = $0.05/hr, Sonnet = $0.19/hr). This gives users an intuitive unit instead of raw dollars.
Tier-Locked Settings
When your iOS app uses GhostPour as the provider, these settings are server-controlled:
| Setting | Bring-Your-Own-Key | GhostPour Managed |
|---|---|---|
| Model selection | User choice | Locked: "auto" (server picks) |
| Auto-summary interval | User choice (2-15 min) | Locked per tier: 10 or 15 min |
| Summary mode | User choice | Locked per tier: "delta" or "choice" |
| Max images per query | 5 | Locked per tier: 1-5 |
| Context Quilt | User choice | Tier-gated: enabled/teaser/off |
The iOS app reads these from the /v1/usage/me response and locks the corresponding UI controls.
Auto-summaries are full chat requests that consume allocation. Sonnet is ~4x more expensive per token than Haiku, so Sonnet tiers default to 15-minute intervals instead of 10 to prevent users from burning through hours unexpectedly.
On-Device Fallback
When allocation is exhausted, offer a graceful degradation path:
This keeps your app functional even when the user has used all their cloud hours.
Future: Overage Credits
The database infrastructure for overage credit purchases exists but is not yet active:
monthly_allocation → overage_balance → 429 fallback to on-device
The overage_balance_usd column and response fields are in place. When ready, add a StoreKit consumable product for credit packs and wire up the deduction logic.