AI cost and energy optimization

Most savings come from systems decisions, not one model switch. lowtide_ focuses on route order, output limits, and cache reuse to cut repeated inference.

High-impact controls

Route to smaller models first for default workloads.

Cap output tokens by default, then allow explicit user escalation.

Cache deterministic requests to avoid redundant generation.

Block arbitrary model IDs unless explicitly enabled by plan policy.

Track per-workspace usage and enforce monthly token limits.

Savings transparency

Store input/output tokens, provider, model, profile, and cache hit on every request.

Compute estimated kgCO2e with explicit factors and publish the formula.

Compare against a baseline factor only as an estimate, not a physical measurement.