- Tracks pricing for 121+ models across 16 providers
- Covers input, output, cached, and batch pricing tiers
- Updated weekly with automated price verification
- Free API: 5 requests/min, no authentication required
Try asking ChatGPT or Perplexity: “What’s the cheapest AI API for customer support?”
How AI API Pricing Works
AI API pricing is based on the number of tokens processed — the basic units that large language models use to read and write text. A token is roughly four characters, or about three-quarters of a word. For example, the sentence “Hello, world!” contains approximately 4 tokens.
Most providers charge separately for input tokens (the prompt you send, including system instructions and conversation history) and output tokens (the response the model generates). Output tokens typically cost 2–10x more than input tokens because generating text requires more compute than reading it.
Pricing is quoted per million tokens ($/M tokens). For example, a model priced at $3.00/M input tokens and $15.00/M output tokens will cost $0.003 per 1,000 input tokens and $0.015 per 1,000 output tokens. At 100,000 requests per month with 1,000 input and 500 output tokens each, that model would cost roughly $2,025/mo.
Providers also offer cost-saving features: prompt caching stores repeated context (like system prompts) so subsequent requests pay a fraction of the original input price. Batch APIs process requests asynchronously and typically offer 50% discounts. Our calculator lets you model all of these scenarios side by side so you can find the most cost-effective model for your exact workload.
AI API Pricing by Provider
OpenAI
San Francisco, CA
Models
23
From
$0.05/M
Anthropic
San Francisco, CA
Models
12
From
$0.25/M
Mountain View, CA
Models
15
From
$0.04/M
Mistral AI
Paris, France
Models
7
From
$0.10/M
DeepSeek
Hangzhou, China
Models
2
From
$0.28/M
Cohere
Toronto, Canada
Models
4
From
$0.10/M
Together AI
San Francisco, CA
Models
7
From
$0.18/M
Groq
Mountain View, CA
Models
9
From
$0.05/M
OpenRouter
San Francisco, CA
Models
9
From
$0.09/M
Fireworks AI
San Francisco, CA
Models
8
From
$0.07/M
Perplexity
San Francisco, CA
Models
4
From
$1.00/M
Cerebras
Sunnyvale, CA
Models
3
From
$0.10/M
AWS Bedrock
Seattle, WA
Models
10
From
$0.04/M
Azure OpenAI
Redmond, WA
Models
6
From
$0.15/M
SambaNova
Palo Alto, CA
Models
5
From
$0.10/M
Nvidia NIM
Santa Clara, CA
Models
5
From
$0.04/M
Popular Model Comparisons
Find the Best Model for Your Use Case
Customer Support Bot
AI-powered customer support chatbots handle common inquiries, route escalations, and provide 24/7 assistance. These workloads typically involve short-to-medium user messages and moderately detailed responses from a knowledge base.
11 recommended models →
Code Generation
Code generation workloads involve large prompts containing existing code context and detailed instructions, with lengthy generated code responses. These tasks benefit from models with strong coding benchmarks and large context windows.
10 recommended models →
Content & Copywriting
Content and copywriting tasks involve brief prompts or outlines as input, with long-form generated content as output. Blog posts, marketing copy, and product descriptions are common examples where output tokens dominate costs.
9 recommended models →
Data Extraction
Data extraction involves sending large documents or structured data as input and receiving concise structured output (JSON, CSV, key-value pairs). Input costs dominate due to long context and short structured responses.
11 recommended models →
RAG / Semantic Search
Retrieval-Augmented Generation (RAG) pipelines retrieve relevant document chunks and include them with user queries. Input costs are driven by retrieved context, with moderate-length generated responses.
9 recommended models →
Document Summarization
Document summarization processes long documents — contracts, research papers, reports, transcripts — and produces concise summaries. Dominated by high input token counts with moderate output length.
10 recommended models →
General Chatbot
General-purpose conversational AI for consumer and enterprise applications. Balanced input and output token usage with conversational back-and-forth requiring context retention across turns.
10 recommended models →
Text Classification
Text classification tasks involve short-to-medium input text and very brief output (a category label, sentiment score, or JSON object). Extremely cost-efficient at scale — ideal for high-volume automated pipelines.
9 recommended models →
AI API Cost Optimization Tips
- 1
Use Prompt Caching
If your system prompt or knowledge base context is repeated across requests, enable prompt caching. Anthropic and OpenAI offer cached input at 50–90% off the standard input price. At 100K requests/month with a 2,000-token system prompt, caching alone can save hundreds of dollars.
- 2
Leverage the Batch API
For non-real-time workloads — document processing, bulk classification, overnight report generation — use the Batch API. OpenAI, Anthropic, and Google all offer 50% discounts for asynchronous batch jobs. Results are returned within 24 hours.
- 3
Right-size Your Model
Premium flagship models cost 10–100x more than budget models. For tasks like text classification, sentiment analysis, or simple Q&A, a budget or mid-tier model often performs just as well at a fraction of the cost. Use our calculator to compare costs across tiers.
- 4
Optimize Your Prompts
Every token in your system prompt and few-shot examples costs money on every request. Audit your prompts for redundant instructions, overly verbose examples, and unnecessary context. Reducing a 500-token system prompt to 200 tokens is effectively a 60% discount on input costs.
- 5
Monitor Output Length
Output tokens typically cost 3–5x more than input tokens. Set max_tokens limits to prevent runaway outputs, and instruct the model to be concise for tasks that don't need long responses. For structured tasks, JSON output is often shorter than natural language.
- 6
Compare Across Providers Regularly
AI API pricing changes frequently as providers compete on cost. A model that was cheapest six months ago may have been undercut by new entrants. Re-run your cost comparison quarterly — newer providers regularly offer dramatically lower prices on comparable capability.
- 7
Use Volume Discounts and Commitments
At high monthly spend, contact providers directly about enterprise pricing, committed-use discounts, or prepaid token bundles. Most providers offer 10–30% discounts for annual commitments or minimum monthly spend agreements above $10,000.
Frequently Asked Questions
What is a token in AI API pricing?
A token is the basic unit of text that language models process. One token is roughly 4 characters of English text, or about 0.75 words. The word "calculator" is 2 tokens; "AI" is 1 token. Most providers count tokens using the same BPE (Byte Pair Encoding) tokenizer approach, though the exact tokenization varies by model family. When you send a request, every token in your prompt (input) and every token the model generates (output) is counted and billed.
Why are output tokens more expensive than input tokens?
Generating output tokens requires the model to perform inference one token at a time, which is computationally intensive and sequential — each token depends on the previous one. Reading input tokens can be processed in parallel. As a result, providers typically charge 3–10x more for output tokens than input tokens. This means workloads like content generation (long outputs) cost proportionally more than data extraction tasks (short outputs).
What is prompt caching and how much does it save?
Prompt caching lets you reuse expensive input context — like long system prompts, documents, or conversation history — across multiple requests without re-billing the full input cost each time. Once a prompt prefix is cached, subsequent requests using that same prefix pay a fraction of the standard price (typically 10–50% of the normal input rate). For use cases with large, repeated context like RAG systems or document analysis, prompt caching can reduce input costs by 60–90%.
What is the Batch API and when should I use it?
The Batch API processes requests asynchronously and returns results within 24 hours, rather than responding in real time. In exchange for accepting this delay, providers like OpenAI, Anthropic, and Google offer a 50% discount on both input and output costs. Use the Batch API for offline workloads: bulk document processing, nightly report generation, large-scale classification pipelines, or any task where users don't need an immediate response.
How do I calculate my monthly AI API cost?
Your monthly cost = (average input tokens per request × input price per million ÷ 1,000,000 + average output tokens per request × output price per million ÷ 1,000,000) × monthly request volume. For example, at 1,000 input tokens and 500 output tokens per request, 100,000 requests/month, with a model at $3/M input and $15/M output: ($3 × 1,000/1,000,000 + $15 × 500/1,000,000) × 100,000 = ($0.003 + $0.0075) × 100,000 = $1,050/month. Our calculator handles this math automatically for any combination of models.
Which AI API is cheapest?
It depends on your use case. The cheapest models by raw price per token change frequently as providers compete — use our calculator to compare real costs for your specific input/output ratio, volume, and quality requirements. Budget-tier models typically start under $0.50/M input tokens, while premium flagship models range from $2–15/M input. Mid-tier models offer the best price-performance ratio for most production workloads.
Do all providers charge for context window tokens?
Yes — you are charged for every token in the context window that the model processes on each request, including previous conversation turns, retrieved document chunks, and system prompts. This means multi-turn conversations grow more expensive over time as the history accumulates. To manage this, you can truncate conversation history, summarize older turns, or use prompt caching to reduce the per-request cost of repeated context.
How often do AI API prices change?
AI API pricing changes frequently — sometimes multiple times per year. Competition between providers has driven prices down dramatically. GPT-4-class capability that cost $60/M output tokens in 2023 now costs under $5/M with current-gen models. We verify all pricing data weekly against official provider documentation. We recommend re-running your cost comparison quarterly to catch price drops and new model releases.
Weekly AI Pricing Updates
Price drops, new models, cost optimization tips — delivered every Monday. We'll never spam you.
No spam, unsubscribe anytime. Privacy policy