
Koshima Satija
Co-founder & COO, Flexprice
The downstream impact on orchestration platforms
For infrastructure providers (STT, TTS, LLM), per-minute inconsistency is a known cost of doing business. They set their own definitions, and their customers adapt.
But for orchestration platforms, the companies that stitch together multiple providers into a single Voice AI product, these inconsistencies create a much bigger problem.
An orchestration platform consuming Deepgram for STT, OpenAI for the LLM, and ElevenLabs for TTS is running three different meters simultaneously for every call:
Deepgram measures audio duration in seconds, with silence handling.
OpenAI meters LLM processing in tokens.
ElevenLabs meters synthesized speech in characters.
The platform then needs to present a single, unified per-minute price to its end customer. That per-minute rate is an abstraction over three fundamentally different billing units.
The translation layer between provider metering and customer billing is where orchestration platforms either build margin or lose it. If your internal cost calculation assumes Deepgram will meter 45 seconds for a call, but Deepgram actually meters 60 seconds because it rounds up, your margin on that call is lower than your model predicts.
Scale that across thousands of calls per day, and the metering assumptions in your billing system become one of your most important financial variables.
How the definition gap affects customer trust
The per-minute inconsistency doesn't just create internal cost problems. It also creates customer trust problems.
When a customer runs their own stopwatch on a call and sees 2 minutes and 14 seconds, but your invoice shows 3 minutes, they feel overcharged. It doesn't matter that the difference is explained by rounding rules, connection setup time, and multichannel processing. From the customer's perspective, the numbers don't add up.
This is especially acute in Voice AI because the end users' businesses deploying AI agents are often metering-aware. They're tracking their own usage internally. They have dashboards showing call durations. When those numbers don't match your invoice, the first instinct is to suspect your billing, not question the definition of a minute.
The most transparent Voice AI platforms address this directly. They publish their metering methodology: how they define a billable minute, whether silence is included, what rounding rules they use, and how multi-channel calls are handled. This doesn't eliminate questions, but it provides a clear vision to support teams when customers ask.
Some platforms go even further and show both the raw duration and the billable duration on every invoice line item. This level of transparency requires more sophisticated billing infrastructure, but it dramatically reduces billing-related support tickets and builds long-term customer trust.
The provider switching dilemma
There's one more dimension to the per-minute problem that's worth addressing: what happens when you switch providers?
The Voice AI infrastructure market is moving fast
New STT models launch every few months
TTS quality keeps improving
Pricing keeps changing
Most platforms will switch at least one provider in their stack within the first 18 months.
When you switch from Provider A, which strips silence and bills per second, to Provider B, who includes silence and rounds to the nearest 6-second increment, your internal cost per call changes, even if Provider B's listed rate is lower.
If your billing system hardcodes assumptions about how the upstream provider meters, a provider switch becomes a billing migration. You need to re-model your margins, potentially update customer pricing, and definitely recalibrate your reconciliation logic.
The alternative is building a metering abstraction layer from the start, where your system captures raw events, the provider adapter translates them into your internal unit, and your pricing engine operates on the internal unit. Provider switches happen at the adapter layer without touching pricing or invoicing.
This is more work upfront, but it's the difference between a provider switch taking a day versus a month.
What this means for your billing infrastructure
If you are basically building in the Voice AI space, whether at the infrastructure layer or the orchestration layer, the per-minute inconsistency creates concrete billing requirements:
Track raw usage independently of pricing
Your metering system should capture the actual duration, the provider-reported duration, and the billable duration as three separate fields. This gives you the audit trail you need when the numbers don't match.
Don't assume minutes are fungible
A minute of STT, a minute of telephony, and a minute of TTS are different cost events, even if they happen simultaneously during the same call. Your billing system needs to track them independently.
Model your rounding impact
Before committing to a provider, calculate the rounding cost at your expected volume. The difference between per-second and per-minute rounding at 100K calls/month can be tens of thousands of dollars annually.
Build provider-agnostic metering
When you switch from one STT provider to another, and you will eventually because the market is moving fast, but your customer-facing billing shouldn't change. The metering translation layer needs to normalize provider-specific units into your own billing metric.
Reconcile provider bills against internal metering
If your system says you consumed 50,000 minutes of Deepgram, and Deepgram's invoice says 54,000, you need to understand the gap. Most of the time, it's rounding. Sometimes, it's a metering bug. Either way, your billing system should flag the discrepancy automatically.
The real takeaway
The per-minute rate is a number on a pricing page. The per-minute definition is a billing architecture decision that affects every invoice, every margin calculation, and every provider reconciliation.
When two Voice AI companies both charge $0.08 per minute, they might be measuring completely different things. One strips silence, bills per second, and counts mono audio. The other includes silence, rounds up per minute, and bills stereo channels separately. The cost difference on the same call could be 2x or more.
The minute isn't the problem. The problem is that your billing system assumes all minutes are equal, which they're not.
And the companies that win here are the ones whose billing systems can track real usage, clean it up, and catch mismatches so their margins don’t drift as they grow.





























