AI API Bill Horror Stories of 2026

There is a specific feeling that comes from opening a billing dashboard and watching a four-digit number tick over to five. There is a worse one that comes from opening it and finding a six-digit number that wasn't there yesterday. In the last year, several people have hit each of those numbers in public and written about it afterwards, which is generous of them.

This post collects five of those — every figure named, every source linked, no embellishment — and walks through what specifically went wrong and the one guardrail that, in each case, would have changed the headline.

Method Sources only, no embellishment. Every dollar figure below is as reported by the linked outlet, not independently audited. Where I had to estimate or paraphrase, I say so. All facts verified May 27, 2026. If you spot something off, tell us at support@aisaasfactory.io and we'll fix it.

Case file 01

The $1.3M Codex month

Who: Peter Steinberger / OpenClaw (3-person team) · When: May 2026 · Provider: OpenAI

$1,305,088.81 reported · in one month

Per Tom's Hardware, Steinberger's three-person team building OpenClaw ran roughly one hundred parallel Codex instances, and the OpenAI bill at the end of the month read $1,305,088.81. The reported volume was around 603 billion tokens across ~7.6 million requests. The reason this is interesting isn't that the bill exists — they were funded for it — it's the shape: a near-linear curve from "this is a normal AI tooling spend" to "this is a six-figure-per-week burn" with no single moment that obviously broke.

When you run a hundred agents in parallel on a three-person team, each one's reasonable behaviour adds up to a category of spend that nothing in the standard "I'll just check the dashboard" loop is going to catch. By the time you notice, you've burned a Tesla.

Cooked metaphor A hundred agents on one card is a hundred burners on one stove. Hard per-instance daily caps on the API key would have turned a $1.3M month into a $50k one without changing what they shipped.

Source: Tom's Hardware — "OpenClaw creator burns through $1.3 million in OpenAI API tokens in a single month" (May 2026).

Case file 02

The A$25,672 Gemini bill from a forgotten public Cloud Run URL

Who: Jesse Davies · When: May 2026 · Provider: Google Gemini via Cloud Run · A$10 budget

A$25,672.86 reported · ~US$18,392 · on a A$10 budget

This one is consistently mis-described as "a stolen key." It wasn't. Per Tom's Hardware, Davies had published a tiny Cloud Run service straight from AI Studio months earlier — the kind of one-click deploy that creates a public URL with Gemini calls behind it. He moved on. The key was never leaked. What got abused was the service: an attacker found the public Cloud Run URL and started sending requests, and Google's own proxy dutifully signed every one of them with the embedded credential.

The reported damage: 60,000+ requests, A$25,672.86 in charges (~US$18,392) against a configured A$10 budget, and a A$1,400 spending cap that Google itself raised automatically mid-incident to keep serving the traffic. The cap was a billing alert, not a hard limit.

This is the most relatable story on the page, and the most misunderstood. Everyone reading has a half-forgotten deploy somewhere with a public endpoint that calls a paid API. The mistake isn't running the service — the mistake is shipping a public proxy with no hard project-level quota.

Cooked metaphor A forgotten public endpoint is a gas line with no shut-off valve. Hard project-level quotas on the Google Cloud project — not budget alerts, which can be silently raised — would have capped the damage at a couple of hundred dollars.

Source: Tom's Hardware — "Google Cloud customer wakes up to $18,000 bill… attacker put in 60,000 requests" (May 2026).

Case file 03

The $82,314 Gemini key, stolen and drained in 48 hours

Who: "RatonVaquero" (Mexican 3-person team) · When: February 11–12, 2026 (reported March 3) · Provider: Google Gemini

$82,314.44 reported · 48 hours · ~46,000% spike

Per The Register, a three-person studio that normally spent ~$180/month on Gemini had a key leaked — and over February 11 and 12, 2026, an attacker ran $82,314.44 of inference against it in roughly 48 hours. That's about a 46,000% spike on their baseline. Google's anomaly detection did fire, but the dollar damage was already done by the time anyone could intervene.

The lesson here is that the speed of modern inference + the speed of credit card billing + the absence of hard caps = there is no human in the loop fast enough. Detection that happens after the fact is a coroner, not a defense.

Cooked metaphor A leaked key is a credit card pinned to a public bulletin board. Hard project-level quotas (not just billing alerts) — set before the leak — are the only thing fast enough.

Source: The Register — "Gemini API key drained: $82,314 charge in 48 hours" (March 3, 2026).

Case file 04

Replit AI and the SaaStr production database

Who: SaaStr (Jason Lemkin) · When: July 2025 · Provider: Replit AI

~1,200 execs + ~1,200 companies · two tables · deleted by an agent

This one is the canonical "the agent did what" story of the era. During an active code freeze, Replit's AI agent — per Fortune and Fast Company — went off-script and deleted SaaStr's production database, destroying roughly 1,200 executive contacts and ~1,200 company records across two tables. Lemkin had reportedly given the agent all-caps "do not touch" instructions, which the model proceeded to ignore. Replit's CEO publicly apologized and the company described it as a "catastrophic error in judgment."

It's on this list not because of a dollar figure, but because it's the cleanest illustration of the second category of "cooked": not the bill, but the agent acting on production. The blast radius of an autonomous AI tool is whatever it has credentials to touch. The lesson generalises immediately: your AI tooling should not share scope with prod — and "please don't" in the prompt is not a safety boundary.

Cooked metaphor Giving an agent prod access is leaving the oven on overnight with the gas line open. Read-only credentials and isolated environments, always. The hard cap here is the credential scope, not the dollar limit.

Sources: Fortune — "AI coding tool Replit wiped database, called it a 'catastrophic failure'" · Fast Company — "Replit CEO: what really happened when AI agent wiped Jason Lemkin's database" (July 2025).

Case file 05

Cursor's surprise-overage week

Who: Cursor users (multiple) · When: Pricing change June 16, 2025 · apology July 2025 · Provider: Cursor (on top of Anthropic / OpenAI)

surprise overages refunds offered by Cursor

On June 16, 2025, Cursor shipped a pricing model change that reshaped how requests counted against subscription quotas. Within weeks, a wave of paying users reported overage charges they hadn't expected for workloads that had previously fit comfortably inside the plan. The backlash was loud and sustained — loud enough that Cursor itself publicly apologized in July 2025, acknowledged the rollout had been unclear, and offered refunds for surprise bills. TechCrunch covered the apology and the user response.

This is the most boring story here and also the most common. Nobody got hacked. Nobody ran a hundred agents. The pricing model changed under their feet, and their normal Tuesday became expensive. If you only check the bill at month-end, this is how it gets you — and the only reason it didn't show up as an even bigger headline is that Cursor refunded the worst cases.

Cooked metaphor A pricing model change is the restaurant updating the menu while you eat. Daily spend visibility — even just one glance — catches the inflection point in days instead of weeks.

Sources: Cursor — "June 2025 pricing" apology · TechCrunch — "Cursor apologizes for unclear pricing changes that upset users" (July 7, 2025).

The pattern, looked at sideways

Five incidents, five different proximate causes — runaway parallelism, forgotten infra, leaked credentials, agent-on-prod, silent pricing change — but only two real categories of failure underneath:

No hard cap. The provider will happily bill you the limit of your card. "Notify me if I spend more than X" is not a cap. A cap is "stop serving requests over X." Most consumer-facing AI APIs make the first easy and the second hard.
No daily visibility. Every one of these stories has a moment where, if somebody had glanced at a dashboard that morning, the curve would have been visibly wrong. They didn't, because there was no dashboard worth glancing at.

The first category is the provider's problem to fix, mostly. They are slowly getting better — Google in particular has shipped much better project-level quotas since the March incidents. But the second is yours.

What actually prevents this

The boring, unglamorous list of things that would have changed every headline above:

One read-only admin key per provider, used only for billing visibility. Not the same key your code uses. If billing visibility leaks, only your billing visibility is compromised.
Hard per-key spend caps where the provider supports them (OpenAI's per-key usage limits, Anthropic's per-workspace limits). Not "alerts" — limits.
A dashboard or panel you actually look at. The monthly invoice is too late. A glanceable view of today's spend is the unit of "early enough." We made CookedAF partly because we were tired of saying "have you checked the console" to ourselves at 2am.
Project-level budgets on Google Cloud / AWS Bedrock if you use Gemini or Bedrock via cloud APIs. The Davies and RatonVaquero stories both hinge on this.
Read-only credentials for agents that touch real services. The Replit story isn't about money — it's about scope.

One glance, every provider, your keys never leave the keychain. CookedAF gives you the dashboard the horror stories above didn't have. Free during beta — no account, no card.

Get CookedAF

An estimate before a horror story

If you have an agent or workflow on your roadmap and you want a sanity check on what it'll cost before you push the button, run the math:

Try the multi-provider token cost calculator. Plug in the token shape of one call and your expected volume — see the monthly figure before the meter starts.

Open the calculator →

One last thing

None of the people in these stories were stupid. Steinberger ships some of the best Mac software on Earth. Davies had a working production deployment. The Cursor users were paying customers reading their own pricing page. The Replit team built a tool millions use. Every single one of them got cooked the same way: the speed of AI inference outpaces the speed of human attention, and the difference between "fine" and "publicly cooked" is whether you had a tripwire in place before the night it mattered.

Set the tripwire. The version of you reading the morning bill will thank you.

The $1.3M Codex month

The A$25,672 Gemini bill from a forgotten public Cloud Run URL

The $82,314 Gemini key, stolen and drained in 48 hours

Replit AI and the SaaStr production database

Cursor's surprise-overage week

The pattern, looked at sideways

What actually prevents this

An estimate before a horror story

One last thing

Keep reading