Ayush ParchureA Technical Deep Dive Into Usage-Based Billing for AI and SaaS Companies
A Technical Deep Dive Into Usage-Based Billing for AI and SaaS Companies
A Technical Deep Dive Into Usage-Based Billing for AI and SaaS Companies
A Technical Deep Dive Into Usage-Based Billing for AI and SaaS Companies
A Technical Deep Dive Into Usage-Based Billing for AI and SaaS Companies
• 10 min read
• 10 min read

Ayush Parchure
Content Writing Intern, Flexprice

From the outside, usage-based billing looks like a clean line item on an invoice. But underneath sits an engineering layer where many different systems have to work in sync every time a customer makes a request.
That balance breaks more often than anyone wants to talk about. A duplicate event slips through, and a customer gets charged twice. A late event arrives after the invoice has been finalized, and the numbers stop matching. A pricing change that should have shipped in a day stretches into three weeks, while the team is meant to be building product debug billing instead.
The real reason is that usage-based billing was never one product. It is six layers wired together, where the decisions at the bottom quietly shape everything above.
This piece walks you through all with real numbers and real code, so the build versus buy call stops feeling like a guess.
TL;DR
Usage-based billing is six layers stacked on top of each other: event ingestion, storage, metering, pricing and rating, entitlement, and invoicing. Each layer has traps that cost months to fix. This piece walks through every one of them with real numbers, real code, and real Flexprice examples.
How usage based billing works end-to-end
Most teams picture usage-based billing as one thing, but in reality it is not. It is actually a chain of six stages, and the output of each one feeds the next. The working structure of usage-based billing looks like this: ingestion, storage, metering, rating, entitlement, and invoicing.
You can see that around 61% of SaaS companies have already shifted to usage-based pricing. That is a lot of companies that are trusting this architecture.
Let's see this layer by layer, why they prefer usage-based billing.
Event ingestion
This is the layer that captures every customer action as a structured JSON event. An API call, a token used, and a GB stored. Each action becomes a small record that your system can count later.
Example
If a customer hits your AI API at 10:14 AM. Your app fires off an event with an ID, a timestamp, a customer reference, and properties like tokens used and the model called. That single event eventually becomes a tiny piece of an invoice at the end of the month.
Now multiply this by a few million customers, and you can see why this layer matters.
Usage-based businesses leak 4 to 9% of revenue on average, and most of it traces back to gaps in this exact layer.
Four things that you need to get right
Throughput:
At 500k events per second and 200 bytes per event, you are pushing roughly 100 MB per second. That number decides how many Kafka or Kinesis partitions you need on day one.
Idempotency:
Every event carries a unique event_id. Dedupe in Redis with a 7-day TTL, or use ClickHouse ReplacingMergeTree at the storage layer. Without this, retries quietly double your bills.
Schema enforcement:
Unknown events get quarantined instead of being silently dropped. Schema drift is the kind of leak that runs for weeks before anyone notices.
Two timestamps:
Track event_time (when the action happened) and ingestion_time (when you received it). Late events are normal. Your pipeline needs to backfill them into already-aggregated windows.
How Flexprice handles event ingestion
Flexprice has a single ingestion endpoint. It accepts events asynchronously and returns 202 with a server-side event_id. Bulk ingest supports up to 1,000 events per request.
curl --request POST \ --url https://api.cloud.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization --header 'Content-Type: application/json' \ --header 'x-api-key: <your_api_key>' \ --data '{ "event_name": "model.usage", "external_customer_id": "cust_acme", "event_id": "evt_01HX3K9...", "timestamp": "2026-05-13T10:14:22Z", "properties": { "input_tokens": 1843, } }'
curl --request POST \ --url https://api.cloud.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization --header 'Content-Type: application/json' \ --header 'x-api-key: <your_api_key>' \ --data '{ "event_name": "model.usage", "external_customer_id": "cust_acme", "event_id": "evt_01HX3K9...", "timestamp": "2026-05-13T10:14:22Z", "properties": { "input_tokens": 1843, } }'
curl --request POST \ --url https://api.cloud.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization --header 'Content-Type: application/json' \ --header 'x-api-key: <your_api_key>' \ --data '{ "event_name": "model.usage", "external_customer_id": "cust_acme", "event_id": "evt_01HX3K9...", "timestamp": "2026-05-13T10:14:22Z", "properties": { "input_tokens": 1843, } }'
curl --request POST \ --url https://api.cloud.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization --header 'Content-Type: application/json' \ --header 'x-api-key: <your_api_key>' \ --data '{ "event_name": "model.usage", "external_customer_id": "cust_acme", "event_id": "evt_01HX3K9...", "timestamp": "2026-05-13T10:14:22Z", "properties": { "input_tokens": 1843, } }'
curl --request POST \ --url https://api.cloud.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization --header 'Content-Type: application/json' \ --header 'x-api-key: <your_api_key>' \ --data '{ "event_name": "model.usage", "external_customer_id": "cust_acme", "event_id": "evt_01HX3K9...", "timestamp": "2026-05-13T10:14:22Z", "properties": { "input_tokens": 1843, } }'
The event_id gives you idempotency, timestamp is your event_time, and properties carries everything you want to bill on later.
Event storage
The long-term store that holds every raw event, you need it for aggregation, audit, and replay.
Example
Say your AI product produces 50 million events a day. That works out to 18 billion events a year, or roughly 7 TB of compressed storage at a 5x ratio. And every time someone opens their usage dashboard, you need to answer "how much did I use last month" in under a second. That is the kind of query load this layer has to survive.
Three things matter here:
Storage engine:
Most billing queries look like "sum or count this property, filter by customer and time, group by something." That is a columnar workload. Postgres works fine until 5 to 10 million events per month, then it collapses. Move to ClickHouse, Druid, or Pinot well before you hit that ceiling.
Partitioning:
Partition by tenant and date, with a sort key on customer, event name, and timestamp. The standard rating query then becomes a contiguous read instead of a full table scan.
Hot and cold tiers:
Keep recent data in your OLAP store. Push older events to S3 or Parquet for audit and replay.
How Flexprice handles event storage
Flexprice runs ClickHouse under the hood, you never have to touch the raw event store yourself. Instead, there is a usage analytics endpoint that hands you filtered, time-bucketed, grouped usage for any meter.
POST /v1/events/analytics # Body: external_customer_id, feature_ids[], start_time, end_time, window_size
POST /v1/events/analytics # Body: external_customer_id, feature_ids[], start_time, end_time, window_size
POST /v1/events/analytics # Body: external_customer_id, feature_ids[], start_time, end_time, window_size
POST /v1/events/analytics # Body: external_customer_id, feature_ids[], start_time, end_time, window_size
POST /v1/events/analytics # Body: external_customer_id, feature_ids[], start_time, end_time, window_size
You get the numbers your dashboards and invoices need, without ever operating an OLAP cluster.
Metering and aggregation
This is the layer that turns raw events into the actual numbers you bill on. It looks simple from the outside, but the depth lies in picking the right aggregation function for the right meter.
Example
Say you bill on input tokens, a meter for that would filter events where event_name is "model.usage", sum the input_tokens field, and group by customer. Run that over a billing period, and you get the total tokens per customer ready for pricing.
A meter has four parts: an event filter, an aggregation function, a property to aggregate, and group-by dimensions.
Here is why your choice of aggregation function matters
SUM and COUNT are easy; they add up across any time window cleanly.
COUNT UNIQUE at scale needs HyperLog sketches and about 1% error at roughly 12 KB per sketch, which is a fair trade.
MAX needs windowed tracking, which is useful for peak-concurrency billing, like the highest number of concurrent users in a month.
LATEST is a state snapshot, not a sum; this has its own primitive and behaves differently from the others.
How Flexprice handles metering and aggregation
Flexprice models meters as metered features. It supports eight aggregation functions out of the box: Count, Sum, Average, Count Unique, Latest, Sum with Multiplier, Max with time bucketing, and Weighted Sum for capacity billing. You configure the meter once, and Flexprice picks the right strategy underneath.
curl 'https://api-dev.cloud.flexprice.io/v1/features' \ -H 'accept: application/json, text/plain, /' \ -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \ -H 'authorization: Bearer -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -H 'origin: https://admin-dev.flexprice.io' \ -H 'pragma: no-cache' \ -H 'priority: u=1, i' \ -H 'referer: https://admin-dev.flexprice.io/' \ -H 'sec-ch-ua: "Chromium";v="148", "Google Chrome";v="148", "Not/A)Brand";v="99"' \ -H 'sec-ch-ua-mobile: ?0' \ -H 'sec-fetch-dest: empty' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-site: same-site' \ --data-raw '{"name":"Demo Feat","lookup_key":"feat-demo-feat","type":"metered","meter":{"name":"Demo Feat","event_name":"event_abc", "aggregation":{"type": "SUM", "field": "input_tokens"}, "reset_usage": "BILLING_PERIOD"}, "unit_singular": "token", "unit_plural": "tokens"}'
curl 'https://api-dev.cloud.flexprice.io/v1/features' \ -H 'accept: application/json, text/plain, /' \ -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \ -H 'authorization: Bearer -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -H 'origin: https://admin-dev.flexprice.io' \ -H 'pragma: no-cache' \ -H 'priority: u=1, i' \ -H 'referer: https://admin-dev.flexprice.io/' \ -H 'sec-ch-ua: "Chromium";v="148", "Google Chrome";v="148", "Not/A)Brand";v="99"' \ -H 'sec-ch-ua-mobile: ?0' \ -H 'sec-fetch-dest: empty' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-site: same-site' \ --data-raw '{"name":"Demo Feat","lookup_key":"feat-demo-feat","type":"metered","meter":{"name":"Demo Feat","event_name":"event_abc", "aggregation":{"type": "SUM", "field": "input_tokens"}, "reset_usage": "BILLING_PERIOD"}, "unit_singular": "token", "unit_plural": "tokens"}'
curl 'https://api-dev.cloud.flexprice.io/v1/features' \ -H 'accept: application/json, text/plain, /' \ -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \ -H 'authorization: Bearer -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -H 'origin: https://admin-dev.flexprice.io' \ -H 'pragma: no-cache' \ -H 'priority: u=1, i' \ -H 'referer: https://admin-dev.flexprice.io/' \ -H 'sec-ch-ua: "Chromium";v="148", "Google Chrome";v="148", "Not/A)Brand";v="99"' \ -H 'sec-ch-ua-mobile: ?0' \ -H 'sec-fetch-dest: empty' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-site: same-site' \ --data-raw '{"name":"Demo Feat","lookup_key":"feat-demo-feat","type":"metered","meter":{"name":"Demo Feat","event_name":"event_abc", "aggregation":{"type": "SUM", "field": "input_tokens"}, "reset_usage": "BILLING_PERIOD"}, "unit_singular": "token", "unit_plural": "tokens"}'
curl 'https://api-dev.cloud.flexprice.io/v1/features' \ -H 'accept: application/json, text/plain, /' \ -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \ -H 'authorization: Bearer -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -H 'origin: https://admin-dev.flexprice.io' \ -H 'pragma: no-cache' \ -H 'priority: u=1, i' \ -H 'referer: https://admin-dev.flexprice.io/' \ -H 'sec-ch-ua: "Chromium";v="148", "Google Chrome";v="148", "Not/A)Brand";v="99"' \ -H 'sec-ch-ua-mobile: ?0' \ -H 'sec-fetch-dest: empty' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-site: same-site' \ --data-raw '{"name":"Demo Feat","lookup_key":"feat-demo-feat","type":"metered","meter":{"name":"Demo Feat","event_name":"event_abc", "aggregation":{"type": "SUM", "field": "input_tokens"}, "reset_usage": "BILLING_PERIOD"}, "unit_singular": "token", "unit_plural": "tokens"}'
curl 'https://api-dev.cloud.flexprice.io/v1/features' \ -H 'accept: application/json, text/plain, /' \ -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \ -H 'authorization: Bearer -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -H 'origin: https://admin-dev.flexprice.io' \ -H 'pragma: no-cache' \ -H 'priority: u=1, i' \ -H 'referer: https://admin-dev.flexprice.io/' \ -H 'sec-ch-ua: "Chromium";v="148", "Google Chrome";v="148", "Not/A)Brand";v="99"' \ -H 'sec-ch-ua-mobile: ?0' \ -H 'sec-fetch-dest: empty' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-site: same-site' \ --data-raw '{"name":"Demo Feat","lookup_key":"feat-demo-feat","type":"metered","meter":{"name":"Demo Feat","event_name":"event_abc", "aggregation":{"type": "SUM", "field": "input_tokens"}, "reset_usage": "BILLING_PERIOD"}, "unit_singular": "token", "unit_plural": "tokens"}'
To Get More Information
To Get More Information
Pricing models and the rating engine
This is the layer where usage finally turns into money, the rating engine takes a customer's usage, applies their plan's prices, and produces the line items on the invoice.
Example
A voice AI customer used 60,000 minutes in May. Your pricing has three tiers: the first 10k at $0.05 per minute, the next 40k at $0.04, and anything above 50k at $0.03. The rating engine walks the usage across the tiers and lands at (10k × $0.05) + (40k × $0.04) + (10k × $0.03) = $2,400 as a single line item on the invoice.
The four pricing models that matter
Pay as you go:
A flat rate per unit, like $0.01 per API call. The simplest model to explain to customers and the natural starting point for most usage-based products.
Tiered pricing:
The more a customer uses, the cheaper each unit gets. The voice AI example above is a tiered plan: 60k minutes ends up at $2,400.
Volume pricing:
Same setup as tiered, but the entire quantity gets priced at the rate of the highest tier reached. Say API calls cost $0.20 up to 2k, $0.10 from 2k to 4k, and $0.05 above 4k. A customer with 5,000 calls pays all 5,000 at $0.05 = $250. Crossing a tier drops the rate on everything.
Hybrid pricing:
A fixed fee plus usage on top. $99 a month includes 50k emails. Every email beyond that costs $0.001. A customer sending 80k emails pays $99 + (30k × $0.001) = $129.
How Flexprice does this
Flexprice composes every shape above with a small set of enums. billing_model is FLAT_FEE, PACKAGE, or TIERED. tier_mode is VOLUME or SLAB.
{ "billing_cadence": "RECURRING", "billing_period": "MONTHLY", "billing_model": "TIERED", "tier_mode": "SLAB", "type": "USAGE", "currency": "usd", "entity_type": "PLAN", "entity_id": "plan_growth", "meter_id": "meter_token_usage", "invoice_cadence": "ARREAR", "tiers": [ { "up_to": 10000, "unit_amount": "0.002" }, { "up_to": 50000, "unit_amount": "0.001" }, { "up_to": null, "unit_amount": "0.0005" } ] }
{ "billing_cadence": "RECURRING", "billing_period": "MONTHLY", "billing_model": "TIERED", "tier_mode": "SLAB", "type": "USAGE", "currency": "usd", "entity_type": "PLAN", "entity_id": "plan_growth", "meter_id": "meter_token_usage", "invoice_cadence": "ARREAR", "tiers": [ { "up_to": 10000, "unit_amount": "0.002" }, { "up_to": 50000, "unit_amount": "0.001" }, { "up_to": null, "unit_amount": "0.0005" } ] }
{ "billing_cadence": "RECURRING", "billing_period": "MONTHLY", "billing_model": "TIERED", "tier_mode": "SLAB", "type": "USAGE", "currency": "usd", "entity_type": "PLAN", "entity_id": "plan_growth", "meter_id": "meter_token_usage", "invoice_cadence": "ARREAR", "tiers": [ { "up_to": 10000, "unit_amount": "0.002" }, { "up_to": 50000, "unit_amount": "0.001" }, { "up_to": null, "unit_amount": "0.0005" } ] }
You can preview the resulting invoice at any time via POST /v1/invoices/preview without finalizing it.
Real-time entitlements
Billing tells you what someone owes at the end of the month, but entitlements tell you what they are allowed to do right now. Two systems, with two very different latency budgets.
Example
A customer on your Pro plan tries to make their 1,001st API call this month. The entitlement service has to decide in under 10 milliseconds: do you reject the call (hard limit) or let it through and charge overage (soft limit)? That same check runs on every single API call your product makes, all day, every day.
How entitlements show up
Boolean:
Is feature X enabled for this customer? Like access to a premium model or a beta feature.
Usage limit:
1,000 API calls per month, after which you either reject the request (hard limit) or allow overage (soft limit).
Static value:
A config tied to the plan, like max seats or max projects.
How Flexprice handles entitlements
Flexprice exposes entitlements at the customer level. You can check them in the hot path on every request because each entitlement carries a usage_limit, an is_soft_limit flag, and a reset period.
# Check what a customer is allowed to do right now GET /v1/customers/:id/entitlements or GET /v1/customers/{id}/entitlements or GET /v1/customers/{customer_id}/entitlements postman request 'https://api.cloud.flexprice.io/v1/customers/string/entitlements' \ --header 'Accept: application/json' \ --header 'x-api-key: <your_api_key>' \ --body ''
# Check what a customer is allowed to do right now GET /v1/customers/:id/entitlements or GET /v1/customers/{id}/entitlements or GET /v1/customers/{customer_id}/entitlements postman request 'https://api.cloud.flexprice.io/v1/customers/string/entitlements' \ --header 'Accept: application/json' \ --header 'x-api-key: <your_api_key>' \ --body ''
# Check what a customer is allowed to do right now GET /v1/customers/:id/entitlements or GET /v1/customers/{id}/entitlements or GET /v1/customers/{customer_id}/entitlements postman request 'https://api.cloud.flexprice.io/v1/customers/string/entitlements' \ --header 'Accept: application/json' \ --header 'x-api-key: <your_api_key>' \ --body ''
The response returns every feature the customer is entitled to, their current usage, and whether they have hit a limit. For prepaid balances, Flexprice has a separate wallet system that handles real-time debit and FIFO credit consumption.
Invoice lifecycle
An invoice is not a static document; it usually moves through stages, and each transition matters. Get them wrong, and you end up with the "my invoice changed overnight" support tickets nobody wants.
Example
Imagine on May 31, around 11:59 PM, your customer's billing period closes. The system freezes the event window, recalculates the total from raw events, snapshots the prices used, and locks the invoice. But on June 2, a late event arrives with a timestamp from May 28. You will either issue a credit note for the difference or roll the late event into June's bill.
Here is what happens when you finalize a draft
Stop accepting new events for that billing period.
Recalculate the total one last time from raw events, not from cached rollups. This version becomes the source of truth.
Save a snapshot of every price and meter used, so the invoice still makes sense even if you change pricing tomorrow.
Late events after finalization get credit notes or roll to the next cycle. Never silent edits.
Use a freeze window and stop accepting changes to a draft a few hours before it finalizes, so the customer's last view of the draft matches the final invoice.
How Flexprice does this
Flexprice gives you each step in the lifecycle as its own endpoint, plus a scheduled finalization that handles the freeze window automatically.
# Preview a draft total at any time POST /v1/invoices/preview # Finalize once the period closes POST /v1/invoices/{invoice_id}/finalize # Recalculate a draft after a late event curl --request POST \ --url https://us.api.flexprice.io/v1/invoices/{id}/recalculate-v2 \ --header 'x-api-key: <api-key>' # Issue a credit note instead of editing a finalized invoice curl --request POST \ --url https://us.api.flexprice.io/v1/creditnotes \ --header 'Content-Type: application/json' \ --header 'x-api-key: <api-key>' \ --data ' { "invoice_id": "<string>", "reason": "DUPLICATE", "credit_note_number": "<string>", "idempotency_key": "<string>", "line_items": [ { "amount": "<string>", "invoice_line_item_id": "<string>", "display_name": "<string>", "metadata": {} } ], "memo": "<string>", "metadata": {}, "process_credit_note": true } # Schedule the draft to auto-finalize after a freeze window curl --request POST \ --url https://us.api.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization \ --header 'x-api-key: <api-key>'
# Preview a draft total at any time POST /v1/invoices/preview # Finalize once the period closes POST /v1/invoices/{invoice_id}/finalize # Recalculate a draft after a late event curl --request POST \ --url https://us.api.flexprice.io/v1/invoices/{id}/recalculate-v2 \ --header 'x-api-key: <api-key>' # Issue a credit note instead of editing a finalized invoice curl --request POST \ --url https://us.api.flexprice.io/v1/creditnotes \ --header 'Content-Type: application/json' \ --header 'x-api-key: <api-key>' \ --data ' { "invoice_id": "<string>", "reason": "DUPLICATE", "credit_note_number": "<string>", "idempotency_key": "<string>", "line_items": [ { "amount": "<string>", "invoice_line_item_id": "<string>", "display_name": "<string>", "metadata": {} } ], "memo": "<string>", "metadata": {}, "process_credit_note": true } # Schedule the draft to auto-finalize after a freeze window curl --request POST \ --url https://us.api.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization \ --header 'x-api-key: <api-key>'
# Preview a draft total at any time POST /v1/invoices/preview # Finalize once the period closes POST /v1/invoices/{invoice_id}/finalize # Recalculate a draft after a late event curl --request POST \ --url https://us.api.flexprice.io/v1/invoices/{id}/recalculate-v2 \ --header 'x-api-key: <api-key>' # Issue a credit note instead of editing a finalized invoice curl --request POST \ --url https://us.api.flexprice.io/v1/creditnotes \ --header 'Content-Type: application/json' \ --header 'x-api-key: <api-key>' \ --data ' { "invoice_id": "<string>", "reason": "DUPLICATE", "credit_note_number": "<string>", "idempotency_key": "<string>", "line_items": [ { "amount": "<string>", "invoice_line_item_id": "<string>", "display_name": "<string>", "metadata": {} } ], "memo": "<string>", "metadata": {}, "process_credit_note": true } # Schedule the draft to auto-finalize after a freeze window curl --request POST \ --url https://us.api.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization \ --header 'x-api-key: <api-key>'
Build vs buy
Honestly, this is the question that every team eventually runs into. Either you build it yourself, or pay someone else to handle it?
There is no universal answer to it; this usually comes down to where your engineering time is best spent and what your billing actually has to do.
Build
If billing is a crucial part of your product, your contract math is unusual enough that no vendor covers it, and you already have engineers on the team who have shipped billing systems before and know where the bodies are buried, you should go with building in-house.
Buy
If engineering bandwidth is your bottleneck, you want pricing experiments live in days instead of quarters, or you need audit trails, multi-currency, tax, and revenue recognition working on day one, buying is the smarter call here.
Hear it from one of our customers, Shubhendu Shishir, Head of Engineering at Simplismart.
Simplismart scaled to 750+ pricing features on Flexprice without rewriting their billing infrastructure, and reclaimed roughly 30% of their daily engineering bandwidth that used to be stuck in billing work. That is the kind of leverage a good vendor gives you when the platform actually fits your model.
If you land on buy, Flexprice is an enterprise billing platform built for exactly this. The architecture you just read about is what powers it.
The bottom line on usage-based billing
If you take one thing away from this, let it be that usage-based billing is not only one thing, but it is six layers stacked together, and the choices you make on each one will shape what your billing system will feel like a year from now.
The good news is that once you can see all six clearly, the rest of the calls get a lot easier. You know what your stack actually has to do. You know which decisions are reversible and which ones are not. And if you sit down with a vendor, you finally know which questions are worth asking.
Whatever path you pick, the goal is the same. A billing system that quietly does its job so you can get back to building your actual product.
If you want to poke around what this architecture looks like in code, Flexprice is open source. Fork it, run the APIs, break things, and see how it holds up.
Pricing models and the rating engine
This is the layer where usage finally turns into money, the rating engine takes a customer's usage, applies their plan's prices, and produces the line items on the invoice.
Example
A voice AI customer used 60,000 minutes in May. Your pricing has three tiers: the first 10k at $0.05 per minute, the next 40k at $0.04, and anything above 50k at $0.03. The rating engine walks the usage across the tiers and lands at (10k × $0.05) + (40k × $0.04) + (10k × $0.03) = $2,400 as a single line item on the invoice.
The four pricing models that matter
Pay as you go:
A flat rate per unit, like $0.01 per API call. The simplest model to explain to customers and the natural starting point for most usage-based products.
Tiered pricing:
The more a customer uses, the cheaper each unit gets. The voice AI example above is a tiered plan: 60k minutes ends up at $2,400.
Volume pricing:
Same setup as tiered, but the entire quantity gets priced at the rate of the highest tier reached. Say API calls cost $0.20 up to 2k, $0.10 from 2k to 4k, and $0.05 above 4k. A customer with 5,000 calls pays all 5,000 at $0.05 = $250. Crossing a tier drops the rate on everything.
Hybrid pricing:
A fixed fee plus usage on top. $99 a month includes 50k emails. Every email beyond that costs $0.001. A customer sending 80k emails pays $99 + (30k × $0.001) = $129.
How Flexprice does this
Flexprice composes every shape above with a small set of enums. billing_model is FLAT_FEE, PACKAGE, or TIERED. tier_mode is VOLUME or SLAB.
{ "billing_cadence": "RECURRING", "billing_period": "MONTHLY", "billing_model": "TIERED", "tier_mode": "SLAB", "type": "USAGE", "currency": "usd", "entity_type": "PLAN", "entity_id": "plan_growth", "meter_id": "meter_token_usage", "invoice_cadence": "ARREAR", "tiers": [ { "up_to": 10000, "unit_amount": "0.002" }, { "up_to": 50000, "unit_amount": "0.001" }, { "up_to": null, "unit_amount": "0.0005" } ] }
You can preview the resulting invoice at any time via POST /v1/invoices/preview without finalizing it.
Real-time entitlements
Billing tells you what someone owes at the end of the month, but entitlements tell you what they are allowed to do right now. Two systems, with two very different latency budgets.
Example
A customer on your Pro plan tries to make their 1,001st API call this month. The entitlement service has to decide in under 10 milliseconds: do you reject the call (hard limit) or let it through and charge overage (soft limit)? That same check runs on every single API call your product makes, all day, every day.
How entitlements show up
Boolean:
Is feature X enabled for this customer? Like access to a premium model or a beta feature.
Usage limit:
1,000 API calls per month, after which you either reject the request (hard limit) or allow overage (soft limit).
Static value:
A config tied to the plan, like max seats or max projects.
How Flexprice handles entitlements
Flexprice exposes entitlements at the customer level. You can check them in the hot path on every request because each entitlement carries a usage_limit, an is_soft_limit flag, and a reset period.
# Check what a customer is allowed to do right now GET /v1/customers/:id/entitlements or GET /v1/customers/{id}/entitlements or GET /v1/customers/{customer_id}/entitlements postman request 'https://api.cloud.flexprice.io/v1/customers/string/entitlements' \ --header 'Accept: application/json' \ --header 'x-api-key: <your_api_key>' \ --body ''
The response returns every feature the customer is entitled to, their current usage, and whether they have hit a limit. For prepaid balances, Flexprice has a separate wallet system that handles real-time debit and FIFO credit consumption.
Invoice lifecycle
An invoice is not a static document; it usually moves through stages, and each transition matters. Get them wrong, and you end up with the "my invoice changed overnight" support tickets nobody wants.
Example
Imagine on May 31, around 11:59 PM, your customer's billing period closes. The system freezes the event window, recalculates the total from raw events, snapshots the prices used, and locks the invoice. But on June 2, a late event arrives with a timestamp from May 28. You will either issue a credit note for the difference or roll the late event into June's bill.
Here is what happens when you finalize a draft
Stop accepting new events for that billing period.
Recalculate the total one last time from raw events, not from cached rollups. This version becomes the source of truth.
Save a snapshot of every price and meter used, so the invoice still makes sense even if you change pricing tomorrow.
Late events after finalization get credit notes or roll to the next cycle. Never silent edits.
Use a freeze window and stop accepting changes to a draft a few hours before it finalizes, so the customer's last view of the draft matches the final invoice.
How Flexprice does this
Flexprice gives you each step in the lifecycle as its own endpoint, plus a scheduled finalization that handles the freeze window automatically.
# Preview a draft total at any time POST /v1/invoices/preview # Finalize once the period closes POST /v1/invoices/{invoice_id}/finalize # Recalculate a draft after a late event curl --request POST \ --url https://us.api.flexprice.io/v1/invoices/{id}/recalculate-v2 \ --header 'x-api-key: <api-key>' # Issue a credit note instead of editing a finalized invoice curl --request POST \ --url https://us.api.flexprice.io/v1/creditnotes \ --header 'Content-Type: application/json' \ --header 'x-api-key: <api-key>' \ --data ' { "invoice_id": "<string>", "reason": "DUPLICATE", "credit_note_number": "<string>", "idempotency_key": "<string>", "line_items": [ { "amount": "<string>", "invoice_line_item_id": "<string>", "display_name": "<string>", "metadata": {} } ], "memo": "<string>", "metadata": {}, "process_credit_note": true } # Schedule the draft to auto-finalize after a freeze window curl --request POST \ --url https://us.api.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization \ --header 'x-api-key: <api-key>'
Build vs buy
Honestly, this is the question that every team eventually runs into. Either you build it yourself, or pay someone else to handle it?
There is no universal answer to it; this usually comes down to where your engineering time is best spent and what your billing actually has to do.
Build
If billing is a crucial part of your product, your contract math is unusual enough that no vendor covers it, and you already have engineers on the team who have shipped billing systems before and know where the bodies are buried, you should go with building in-house.
Buy
If engineering bandwidth is your bottleneck, you want pricing experiments live in days instead of quarters, or you need audit trails, multi-currency, tax, and revenue recognition working on day one, buying is the smarter call here.
Hear it from one of our customers, Shubhendu Shishir, Head of Engineering at Simplismart.
Simplismart scaled to 750+ pricing features on Flexprice without rewriting their billing infrastructure, and reclaimed roughly 30% of their daily engineering bandwidth that used to be stuck in billing work. That is the kind of leverage a good vendor gives you when the platform actually fits your model.
If you land on buy, Flexprice is an enterprise billing platform built for exactly this. The architecture you just read about is what powers it.
The bottom line on usage-based billing
If you take one thing away from this, let it be that usage-based billing is not only one thing, but it is six layers stacked together, and the choices you make on each one will shape what your billing system will feel like a year from now.
The good news is that once you can see all six clearly, the rest of the calls get a lot easier. You know what your stack actually has to do. You know which decisions are reversible and which ones are not. And if you sit down with a vendor, you finally know which questions are worth asking.
Whatever path you pick, the goal is the same. A billing system that quietly does its job so you can get back to building your actual product.
If you want to poke around what this architecture looks like in code, Flexprice is open source. Fork it, run the APIs, break things, and see how it holds up.
Pricing models and the rating engine
This is the layer where usage finally turns into money, the rating engine takes a customer's usage, applies their plan's prices, and produces the line items on the invoice.
Example
A voice AI customer used 60,000 minutes in May. Your pricing has three tiers: the first 10k at $0.05 per minute, the next 40k at $0.04, and anything above 50k at $0.03. The rating engine walks the usage across the tiers and lands at (10k × $0.05) + (40k × $0.04) + (10k × $0.03) = $2,400 as a single line item on the invoice.
The four pricing models that matter
Pay as you go:
A flat rate per unit, like $0.01 per API call. The simplest model to explain to customers and the natural starting point for most usage-based products.
Tiered pricing:
The more a customer uses, the cheaper each unit gets. The voice AI example above is a tiered plan: 60k minutes ends up at $2,400.
Volume pricing:
Same setup as tiered, but the entire quantity gets priced at the rate of the highest tier reached. Say API calls cost $0.20 up to 2k, $0.10 from 2k to 4k, and $0.05 above 4k. A customer with 5,000 calls pays all 5,000 at $0.05 = $250. Crossing a tier drops the rate on everything.
Hybrid pricing:
A fixed fee plus usage on top. $99 a month includes 50k emails. Every email beyond that costs $0.001. A customer sending 80k emails pays $99 + (30k × $0.001) = $129.
How Flexprice does this
Flexprice composes every shape above with a small set of enums. billing_model is FLAT_FEE, PACKAGE, or TIERED. tier_mode is VOLUME or SLAB.
{ "billing_cadence": "RECURRING", "billing_period": "MONTHLY", "billing_model": "TIERED", "tier_mode": "SLAB", "type": "USAGE", "currency": "usd", "entity_type": "PLAN", "entity_id": "plan_growth", "meter_id": "meter_token_usage", "invoice_cadence": "ARREAR", "tiers": [ { "up_to": 10000, "unit_amount": "0.002" }, { "up_to": 50000, "unit_amount": "0.001" }, { "up_to": null, "unit_amount": "0.0005" } ] }
You can preview the resulting invoice at any time via POST /v1/invoices/preview without finalizing it.
Real-time entitlements
Billing tells you what someone owes at the end of the month, but entitlements tell you what they are allowed to do right now. Two systems, with two very different latency budgets.
Example
A customer on your Pro plan tries to make their 1,001st API call this month. The entitlement service has to decide in under 10 milliseconds: do you reject the call (hard limit) or let it through and charge overage (soft limit)? That same check runs on every single API call your product makes, all day, every day.
How entitlements show up
Boolean:
Is feature X enabled for this customer? Like access to a premium model or a beta feature.
Usage limit:
1,000 API calls per month, after which you either reject the request (hard limit) or allow overage (soft limit).
Static value:
A config tied to the plan, like max seats or max projects.
How Flexprice handles entitlements
Flexprice exposes entitlements at the customer level. You can check them in the hot path on every request because each entitlement carries a usage_limit, an is_soft_limit flag, and a reset period.
# Check what a customer is allowed to do right now GET /v1/customers/:id/entitlements or GET /v1/customers/{id}/entitlements or GET /v1/customers/{customer_id}/entitlements postman request 'https://api.cloud.flexprice.io/v1/customers/string/entitlements' \ --header 'Accept: application/json' \ --header 'x-api-key: <your_api_key>' \ --body ''
The response returns every feature the customer is entitled to, their current usage, and whether they have hit a limit. For prepaid balances, Flexprice has a separate wallet system that handles real-time debit and FIFO credit consumption.
Invoice lifecycle
An invoice is not a static document; it usually moves through stages, and each transition matters. Get them wrong, and you end up with the "my invoice changed overnight" support tickets nobody wants.
Example
Imagine on May 31, around 11:59 PM, your customer's billing period closes. The system freezes the event window, recalculates the total from raw events, snapshots the prices used, and locks the invoice. But on June 2, a late event arrives with a timestamp from May 28. You will either issue a credit note for the difference or roll the late event into June's bill.
Here is what happens when you finalize a draft
Stop accepting new events for that billing period.
Recalculate the total one last time from raw events, not from cached rollups. This version becomes the source of truth.
Save a snapshot of every price and meter used, so the invoice still makes sense even if you change pricing tomorrow.
Late events after finalization get credit notes or roll to the next cycle. Never silent edits.
Use a freeze window and stop accepting changes to a draft a few hours before it finalizes, so the customer's last view of the draft matches the final invoice.
How Flexprice does this
Flexprice gives you each step in the lifecycle as its own endpoint, plus a scheduled finalization that handles the freeze window automatically.
# Preview a draft total at any time POST /v1/invoices/preview # Finalize once the period closes POST /v1/invoices/{invoice_id}/finalize # Recalculate a draft after a late event curl --request POST \ --url https://us.api.flexprice.io/v1/invoices/{id}/recalculate-v2 \ --header 'x-api-key: <api-key>' # Issue a credit note instead of editing a finalized invoice curl --request POST \ --url https://us.api.flexprice.io/v1/creditnotes \ --header 'Content-Type: application/json' \ --header 'x-api-key: <api-key>' \ --data ' { "invoice_id": "<string>", "reason": "DUPLICATE", "credit_note_number": "<string>", "idempotency_key": "<string>", "line_items": [ { "amount": "<string>", "invoice_line_item_id": "<string>", "display_name": "<string>", "metadata": {} } ], "memo": "<string>", "metadata": {}, "process_credit_note": true } # Schedule the draft to auto-finalize after a freeze window curl --request POST \ --url https://us.api.flexprice.io/v1/tasks/scheduled/schedule-draft-finalization \ --header 'x-api-key: <api-key>'
Build vs buy
Honestly, this is the question that every team eventually runs into. Either you build it yourself, or pay someone else to handle it?
There is no universal answer to it; this usually comes down to where your engineering time is best spent and what your billing actually has to do.
Build
If billing is a crucial part of your product, your contract math is unusual enough that no vendor covers it, and you already have engineers on the team who have shipped billing systems before and know where the bodies are buried, you should go with building in-house.
Buy
If engineering bandwidth is your bottleneck, you want pricing experiments live in days instead of quarters, or you need audit trails, multi-currency, tax, and revenue recognition working on day one, buying is the smarter call here.
Hear it from one of our customers, Shubhendu Shishir, Head of Engineering at Simplismart.
Simplismart scaled to 750+ pricing features on Flexprice without rewriting their billing infrastructure, and reclaimed roughly 30% of their daily engineering bandwidth that used to be stuck in billing work. That is the kind of leverage a good vendor gives you when the platform actually fits your model.
If you land on buy, Flexprice is an enterprise billing platform built for exactly this. The architecture you just read about is what powers it.
The bottom line on usage-based billing
If you take one thing away from this, let it be that usage-based billing is not only one thing, but it is six layers stacked together, and the choices you make on each one will shape what your billing system will feel like a year from now.
The good news is that once you can see all six clearly, the rest of the calls get a lot easier. You know what your stack actually has to do. You know which decisions are reversible and which ones are not. And if you sit down with a vendor, you finally know which questions are worth asking.
Whatever path you pick, the goal is the same. A billing system that quietly does its job so you can get back to building your actual product.
If you want to poke around what this architecture looks like in code, Flexprice is open source. Fork it, run the APIs, break things, and see how it holds up.
Frequently Asked Questions
Frequently Asked Questions
How do you handle late-arriving events?
What is the difference between metering and rating?
Why is HyperLogLog used in usage-based billing?
Can you run usage-based billing on Postgres alone?
What is the difference between volume pricing and tiered pricing?

Ayush Parchure
Ayush Parchure
Ayush is part of the content team at Flexprice, with a strong interest in AI, SaaS, and pricing. He loves breaking down complex systems and spends his free time gaming and experimenting with new cooking lessons.
Ayush is part of the content team at Flexprice, with a strong interest in AI, SaaS, and pricing. He loves breaking down complex systems and spends his free time gaming and experimenting with new cooking lessons.
Share it on:


























