AI model pricing in 2026 is simultaneously cheaper than ever and more complex than ever. Prices have dropped dramatically since 2023 โ GPT-4-level capability now costs what GPT-3.5 used to cost โ but the number of models, tiers, and pricing dimensions has exploded. Context caching, batch discounts, input vs. output asymmetry, and provider-specific rate limits all factor into your actual bill.
This is the guide I wish existed when I was trying to optimize AACFlow's infrastructure costs. The numbers below reflect pricing as of mid-2026.
The pricing table
Prices are per 1 million tokens (MTok). Output tokens cost more than input tokens across all providers โ often 3โ5ร.
Anthropic Claude
| Model | Input ($/MTok) | Output ($/MTok) | Context Cache Write | Context Cache Read |
|---|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | $18.75 | $1.50 |
| Claude Sonnet 4 | $3.00 | $15.00 | $3.75 | $0.30 |
| Claude Haiku 4 | $0.80 | $4.00 | $1.00 | $0.08 |
Claude's context caching is its defining cost advantage. For workflows that repeatedly inject the same large system prompt (tool definitions, knowledge base chunks, persona instructions), cache read costs are 10ร cheaper than fresh input tokens. A workflow with a 10K-token system prompt run 1,000 times saves ~$27 in input costs with caching enabled.



