Table of Content

Table of Content

Why Pass Through Pricing Leaves Voice AI Platforms With Margins They Can't Control

Why Pass Through Pricing Leaves Voice AI Platforms With Margins They Can't Control

Why Pass Through Pricing Leaves Voice AI Platforms With Margins They Can't Control

• 9 min read

• 9 min read

Koshima Satija

Co-founder & COO, Flexprice

Building a Voice AI platform today means assembling a stack of specialized providers, each handling a different layer of the call. Deepgram or AssemblyAI for speech recognition, OpenAI or Anthropic for the language model, ElevenLabs or PlayHT for voice synthesis, Twilio or Telnyx for telephony. 

The technology side of this has become remarkably approachable, but the pricing side has quietly become one of the more consequential decisions a Voice AI founder can make.

Most platforms start by charging customers based on what those upstream providers charge them, adding a margin on top and treating it as a solved problem. It is a reasonable place to begin. 

What is less obvious at the start is how much of your margin control you are handing away in the process, and how much that decision compounds as your call volume grows.

In this piece, you will learn why that pricing structure creates problems that only become visible at scale, what it looks like when those problems hit, and how the platforms with the strongest margins have rethought the model from the ground up.

What is pass through pricing strategy?

Pass-through pricing is a strategy where you don't set your own price for the product. Instead, you let your input costs define your output price. You take what your vendors charge you, stack it up, add a margin on top, and that becomes what the customer pays.

The strategy behind this choice is risk avoidance. You never lose money on a single transaction because your price always floats above your cost. If Deepgram charges you more this month, your customer pays more. If OpenAI drops its token pricing, your customer's bill drops too. The margin stays constant in percentage terms, and you never get caught holding the bag on a cost spike you didn't see coming.

This is why it's the default for early-stage platforms. Pass-through lets you launch without doing the hard work of pricing.

But here's what makes it a strategy and not just a billing method: by choosing pass-through, you are making a deliberate decision about where margin control lives. And the answer is, it doesn't live with you. It lives with your providers.

Every pricing strategy involves a tradeoff between risk and control:

  • Fixed pricing means you eat cost fluctuations, but you control your revenue per unit.

  • Value-based pricing decouples your price from your costs entirely, but requires deep customer understanding.

  • Pass-through pricing carries almost zero cost risk, but hands over control over your own unit economics.

For a platform passing through one vendor's cost, that tradeoff might be fine. But in Voice AI, you're passing through four independent cost structures with different units, different billing cycles, and different pricing trajectories, all compounding against each other every time any single provider makes a change.

Pass-through isn't a neutral starting point. It's a strategic position that quietly locks your margins to decisions other companies are making.

What pass through pricing actually looks like in voice AI

A single Voice AI call typically touches four or more upstream providers. Think about what happens when a customer's AI agent makes a phone call:

  • Telephony (Twilio, Telnyx, or Vonage) charges per minute for the phone connection.

  • Speech-to-Text (Deepgram, AssemblyAI, or Google Cloud Speech) charges per second or per minute of audio processed.

  • Large Language Model (OpenAI, Anthropic, or a fine-tuned model) charges per token for generating the response.

  • Text-to-Speech (ElevenLabs, PlayHT, or Amazon Polly) charges per character or per second of audio generated.

In a pass-through model, the orchestration platform adds up these four costs for every call and passes the total to the customer, usually with a markup.

On the surface, this feels clean and smooth because the customer sees exactly what they're paying for.

But the problem starts when you look underneath the invoice.

Why does the billing complexity compounds

When each of those four providers bills differently, like Twilio charges per minute with 60-second increments, Deepgram charges per second with rounding rules that depend on the plan, OpenAI charges per token, where input and output tokens have different prices, and ElevenLabs charges per character, with pricing that varies by voice model.

Your billing system now needs to:

  • Ingest four separate usage streams in real time, each with different units like minutes, seconds, tokens, and characters.

  • Now normalize those units into a single customer-facing metric, which is usually per minute of conversation.

  • Apply provider-specific rounding rules; some round up, some truncate, some bill in fixed increments.

  • Reconcile with your own records against four different provider invoices at the end of the month.

  • Handle mid-month price changes from any provider without breaking active customer contracts.

That last point seems painful. Imagine if OpenAI drops the price of GPT-4o by 50%, your pass-through customers see an immediate reduction in their bill. This causes your revenue to drop, a drastic shift in your margins becomes visible, and this all happens even though the decision was not yours. 

When Deepgram introduces a new pricing tier, you need to figure out whether your existing customers automatically move to the new tier or stay on the old one. Your billing system has to support both scenarios simultaneously.

The margin problem nobody talks about

Many companies usually don't know that the hidden cost of pass-through isn't just billing complexity, but it is margin volatility.

You can think of a platform that charges $0.08 per minute of Voice AI conversation. This is how the structure of the pass-through model looks in practice:

  • Telephony: $0.013/min

  • Speech to text: $0.006/min

  • Large language model: $0.015/min

  • Text to speech: $0.010/min

  • Platform margin: $0.036/min

This looks healthy because around 45% is the gross margin. But here's the bitter truth: none of those input costs is fixed, and you cannot control any of them.

You can think of this example: if OpenAI cuts prices again, as they have done three times in 18 months, in the end, it just makes your pass-through revenue drop.

This creates a strange dynamic because good news for the ecosystem (falling LLM costs) will suddenly become bad news for your revenue. A pass-through platform's top line is directly tied to upstream pricing decisions it had no part in making.

A bundled pricing platform, by contrast, sets its own price for a "minute of conversation." It absorbs provider cost fluctuations that happen internally and then adjusts margins at its own pace. 

When GPT-4o-mini was launched, and it was approx 90% cheaper than GPT-4, what bundled platforms did was, they quietly pocketed the margin improvement, and this made Pass-through platforms watch their invoices shrink.

The compounding effect at scale

At 1,000 calls per month, pass-through billing seems manageable; reconciliation just takes the afternoon, and small errors do not hurt much.

But when you deal with 100,000 calls per month, every upstream pricing quirk is amplified. A 2-cent rounding difference per call becomes $2,000 per month. A provider that switches from 6-second to 60-second billing increments mid-quarter reshapes your entire margin profile.

And because pass-through requires real-time cost aggregation across multiple providers, every scale milestone introduces a new failure mode, like:

  • Provider API latency affects cost calculation speed

  • Rate limit changes affect metering accuracy

  • Outages at one provider create billing gaps that need to be backfilled.

The billing infrastructure for pass-through at scale isn't just metering plus markup. It's a real-time multi-source data pipeline with financial accuracy requirements. Here, most teams underestimate this until they're already at scale, and by that time, it becomes the technical debt in your billing system

Get started with your billing today.

Get started with your billing today.

What happens when customers start asking questions

When your customer's invoice shows a line item for Deepgram STT, like 4,231 minutes, which is roughly set at $0.0077/min, they start asking questions like. Why Deepgram? Could we switch to AssemblyAI? We also saw that Whisper is free. Can we use that instead?

Suddenly, your billing system isn't just tracking costs. It's exposing your supply chain. Your customers know which providers you use, what those providers charge, and they can easily create a comparison sheet around you.

In a bundled model, the customer would see "4,231 minutes of Voice AI at $0.08/min. They don't really know and they don't need to know which STT provider sits underneath. Your provider choices should become an internal optimization, not a customer negotiation.

Due to the rapid change in the Voice AI infrastructure market, this matters the most. This matters because the Voice AI infrastructure market is changing fast. Your billing system needs to handle provider switches without affecting customer invoices. 

  • In a pass-through model, every provider swap is a customer-facing billing event. 

  • In a bundled model, it is just invisible.

The reconciliation nightmare

At the end of every month, pass-through platforms face a reconciliation challenge that most billing systems aren't designed to handle.

You billed your customer based on your internal metering of their usage. But each provider has their own metering, their own rounding, and their own invoice. The numbers almost never match exactly.

And also, Deepgram might report 4,238 minutes, whereas your system actually tracked 4,231. The difference seems to be small, but the truth is it’s not, because when you multiply it across hundreds of customers and four providers, you're looking at meaningful discrepancies. Someone on your team has to figure out whether the gap is a rounding issue, a timezone boundary issue, a metering lag, or an actual billing error.

This reconciliation problem is unique to pass-through, but for bundled platforms, they only reconcile their own metering against their own invoices. This allows one system of record, not five.

How the smartest platforms are moving past pass-through

The orchestration platforms that maintain the healthiest margins tend to follow a pattern: they start with pass-through due to its transparency and speed-to-market, then migrate to bundled pricing as they scale.

You can clearly see how the transition usually looks:

  • Phase 1: full pass-through

Every provider cost is visible on the customer invoice. This works when you have a small number of customers, and you're still optimizing your provider stack.

  • Phase 2: simplified pass-through

Here, you collapse four line items into one or two. Instead of showing Deepgram, OpenAI, ElevenLabs, and Twilio separately, you can display this like AI Processing and Telephony. This allows you to absorb small fluctuations internally.

  • Phase 3: bundled pricing.

Now this is the phase where you set your own per-minute rate that includes all provider costs. Here, cost optimization is handled internally, which allows the customer to see one metric, one price, and one invoice.

Each phase here requires a different billing architecture, like:

  • Phase 1 needs multi-provider metering and cost aggregation. 

  • Phase 2 needs cost grouping logic and margin buffers. 

  • Phase 3 needs a decoupled billing system that separates customer-facing pricing from internal cost tracking.

The billing infrastructure that can handle all three phases without a rebuild is what separates teams that actually scale from the ones that hit a wall.

What this means for your billing architecture

If you're building a Voice AI platform today, the billing decision isn't really about pass-through versus bundled. It's about whether your billing system can support the transition from one to the other.

The requirements of this should look like:

  • Multi-dimensional metering

Here you can track minutes, tokens, characters, and API calls independently, even if you only show one metric to customers.

  • Provider-agnostic cost tracking

When you switch from Deepgram to Whisper, or from ElevenLabs to PlayHT, this should not change your customer invoices.

  • Decoupled pricing

Separate what you pay providers from what you charge customers. These should be independent configuration layers, not hardcoded formulas.

  • Margin monitoring

This provides real-time visibility into per-customer, per-call margins, not just revenue.

  • Price change isolation

When a provider changes rates, your system should flag the margin impact without automatically changing customer prices.

Most of these shelf billing tools handle simple usage-based pricing well, but when it comes to multi-provider, multi-unit, margin-aware billing, Voice AI platforms become a different problem entirely.

Wrapping up

Pass-through pricing is the fastest way to launch. It's transparent, easy to explain, and low-risk when you're small.

But it trades short-term simplicity for long-term fragility because your margins depend on decisions you don't make, your invoices expose your supply chain, and your reconciliation becomes a monthly headache that grows with scale.

Platforms that are building billing infrastructure should support both pass-through and bundled pricing in a way that lets you move between the two smoothly while still keeping your margins under control as you scale.

Billing isn’t where you send invoices. It’s where you either protect your margins or slowly lose them without noticing.

Koshima Satija

Koshima Satija

Koshima Satija is the Co-founder of Flexprice, an open-source metering and billing platform built for the AI era.She’s deeply passionate about building products that simplify complex systems and empower teams to move faster with clarity and confidence.

Koshima Satija is the Co-founder of Flexprice, an open-source metering and billing platform built for the AI era.She’s deeply passionate about building products that simplify complex systems and empower teams to move faster with clarity and confidence.

Share it on:

Ship Usage-Based Billing with Flexprice

Summarize this blog on:

Ship Usage-Based Billing with Flexprice

Ship Usage-Based Billing with Flexprice

More insights on billing

More insights on billing

Get Instant Feedback on Your Pricing | Join the Flexprice Community with 300+ Builders on Slack

Join the Flexprice Community on Slack