LLM Token Costs in 2026: Complete Provider Comparison

AI model pricing in 2026 is simultaneously cheaper than ever and more complex than ever. Prices have dropped dramatically since 2023 — GPT-4-level capability now costs what GPT-3.5 used to cost — but the number of models, tiers, and pricing dimensions has exploded. Context caching, batch discounts, input vs. output asymmetry, and provider-specific rate limits all factor into your actual bill.

This is the guide I wish existed when I was trying to optimize AACFlow's infrastructure costs. The numbers below reflect pricing as of mid-2026.

The pricing table

Prices are per 1 million tokens (MTok). Output tokens cost more than input tokens across all providers — often 3–5×.

Anthropic Claude

Model	Input ($/MTok)	Output ($/MTok)	Context Cache Write	Context Cache Read
Claude Opus 4	$15.00	$75.00	$18.75	$1.50
Claude Sonnet 4	$3.00	$15.00	$3.75	$0.30
Claude Haiku 4	$0.80	$4.00	$1.00	$0.08

Claude's context caching is its defining cost advantage. For workflows that repeatedly inject the same large system prompt (tool definitions, knowledge base chunks, persona instructions), cache read costs are 10× cheaper than fresh input tokens. A workflow with a 10K-token system prompt run 1,000 times saves ~$27 in input costs with caching enabled.

Model	Input ($/MTok)	Output ($/MTok)	Cached Input
GPT-4o	$2.50	$10.00	$1.25
GPT-4.1	$2.00	$8.00	$1.00
GPT-4.1 mini	$0.40	$1.60	$0.20
o3	$10.00	$40.00	$5.00
o4-mini	$1.10	$4.40	$0.55

Model	Input up to 128K ($/MTok)	Input over 128K ($/MTok)	Output ($/MTok)
Gemini 2.5 Pro	$1.25	$2.50	$10.00
Gemini 2.5 Flash	$0.15	$0.30	$3.50
Gemini 2.0 Flash	$0.10	—	$0.40

Model	Input ($/MTok)	Output ($/MTok)	Cache Hit ($/MTok)
DeepSeek-V3	$0.27	$1.10	$0.014
DeepSeek-R1	$0.55	$2.19	$0.14

Model	Input ($/MTok)	Output ($/MTok)
Llama 4 Maverick	$0.20	$0.60
Llama 4 Scout	$0.11	$0.34
Llama 3.3 70B	$0.59	$0.79

Model	Input ($/MTok)	Output ($/MTok)
Llama 4 Maverick	$0.25	$0.75
Llama 4 Scout	$0.10	$0.45

LLM Token Costs in 2026: Complete Provider Comparison

The pricing table

Anthropic Claude

Related posts

Google Gemini

DeepSeek

Groq (Llama 4)

Cerebras

Cost optimization strategies

1. Use context caching for repeated system prompts

2. Route tasks to the right model tier

3. Trim context aggressively

4. Batch non-real-time work

5. Cache at the application layer

BYOK in AACFlow