Skip to main content

Google API Pricing (2026)

Google offers the Gemini family of multimodal models via Google AI Studio and Vertex AI, featuring some of the largest context windows and competitive pricing. Founded in 1998 and headquartered in Mountain View, CA, Google offers 12 active models through their API. All pricing data below is sourced directly from the Google pricing page and is updated regularly.

Prices verified Apr 5, 2026

Google Model Pricing — All Models

All prices are in USD per 1 million tokens ($/M tokens). Input tokens are the text you send to the model; output tokens are the generated response.

ModelTierInputOutputCached InputBatch InputBatch Output
Gemini 3.1 Pro
Google's newest and most capable model in preview, with a 2M token context window and advanced reasoning.
Premium$2.00/M$12.00/M$0.20/M$1.00/M$6.00/M
Gemini 3 Flash
Google's latest Flash model in preview, offering strong capabilities at fast speed and low cost.
Mid$0.50/M$3.00/M$0.05/M$0.25/M$1.50/M
Gemini 2.5 Pro
Google's most capable model with a 1 million token context window and strong performance on complex reasoning and coding tasks.
Premium$1.25/M$10.00/M$0.13/M$0.63/M$5.00/M
Gemini 2.5 Flash
Google's most efficient model with hybrid reasoning capabilities, balancing high speed with strong intelligence at low cost.
Mid$0.30/M$2.50/M$0.03/M$0.15/M$1.25/M
Gemini 2.5 Flash-Lite
Google's most cost-efficient 2.5 series model, optimized for high-volume low-latency tasks.
Budget$0.10/M$0.40/M$0.01/M$0.05/M$0.20/M
Gemini 2.0 Flash
Google's next-generation multimodal model with native tool use, code execution, and real-time streaming support.
Mid$0.10/M$0.40/M$0.03/M$0.05/M$0.20/M
Gemini 2.0 Flash-Lite
The most cost-efficient Gemini model, optimized for high-volume low-latency tasks with a 1M context window.
Budget$0.08/M$0.30/M$0.04/M$0.15/M
Gemini 1.5 Pro(deprecated)
Google's previous-generation Pro model with a 2 million token context window, ideal for large document analysis.
Premium$1.25/M$5.00/M$0.31/M
Gemini 1.5 Flash(deprecated)
A fast, versatile model optimized for high-frequency tasks with a generous 1M token context window.
Budget$0.08/M$0.30/M$0.02/M
Gemini 1.5 Flash-8B(deprecated)
Google's smallest model, an 8B parameter variant of Flash optimized for high-volume low-complexity tasks.
Budget$0.04/M$0.15/M$0.01/M
Gemini 3.1 Flash-Lite
Google's most cost-efficient Gemini 3.1 model, optimized for high-volume lightweight tasks at extremely low cost.
Budget$0.25/M$1.50/M$0.03/M$0.13/M$0.75/M
Gemini 3.1 Flash Image Preview
Gemini 3.1 Flash Image Preview by Google.
Budget$0.50/M$3.00/M
Gemini 3.1 Pro Preview Custom Tools
Gemini 3.1 Pro Preview Custom Tools by Google.
Mid$2.00/M$12.00/M
Gemma 4 31B
Google DeepMind's 30.7B dense multimodal model supporting text, image, and video input. Features 256K context window, configurable reasoning mode, native function calling, and multilingual support across 140+ languages. Open-weight (Apache 2.0).
Budget$0.14/M$0.40/M
Gemma 4 26B A4B
Gemma 4 26B A4B by Google.
Budget$0.13/M$0.40/M

Prices last verified: 2026-04-05. Always confirm on the official pricing page before production use.

Google Model Capabilities

Key technical capabilities across all Google models. Use this table to identify which models support the features your application requires.

ModelContextMax OutputVisionFunctionsJSON ModeCachingBatch APIReasoning
Gemini 3.1 Pro2,097,15265,536
Gemini 3 Flash1,048,57665,536
Gemini 2.5 Pro1,048,57665,536
Gemini 2.5 Flash1,048,57665,536
Gemini 2.5 Flash-Lite1,048,5768,192
Gemini 2.0 Flash1,048,5768,192
Gemini 2.0 Flash-Lite1,048,5768,192
Gemini 3.1 Flash-Lite1,048,57665,536
Gemini 3.1 Flash Image Preview65,53665,536
Gemini 3.1 Pro Preview Custom Tools1,048,57665,536
Gemma 4 31B262,144131,072
Gemma 4 26B A4B 262,144262,144

When to Choose Google

Cost-sensitive workloads

Use Gemini 2.0 Flash-Lite Google's most affordable model at $0.08/M input / $0.30/M output. Estimated cost at 100K requests/month: $22.50.

Complex or high-accuracy tasks

Use Gemini 3.1 Pro Google's most capable model (premium tier) with a 2,097,152-token context window. Priced at $2.00/M input / $12.00/M output.

High-volume pipelines

Google models are well-suited for high-throughput workloads. Consider batch API (where supported) for an additional 50% cost reduction on asynchronous pipelines. Check the pricing table above for batch pricing availability.

See detailed cost comparisons between Google models and models from other providers, including side-by-side pricing tables and monthly cost projections.

Browse use case guides where Google models are recommended, including cost-effective model rankings and monthly cost estimates for each workload.

Google Free Tier Details

The following Google models include a free tier for development and prototyping. Free tier limits apply to unpaid usage only.

ModelRequests/minRequests/dayExpires
Gemini 3.1 Pro525No expiry
Gemini 3 Flash10500No expiry
Gemini 2.5 Pro525No expiry
Gemini 2.5 Flash10500No expiry
Gemini 2.5 Flash-Lite301,500No expiry
Gemini 2.0 Flash151,500No expiry
Gemini 2.0 Flash-Lite301,500No expiry
Gemini 3.1 Flash-Lite301,500No expiry

Deprecated Google Models

The following Google models are deprecated and should not be used in new projects. Historical pricing is shown for reference only.

  • Gemini 1.5 Pro — deprecated 2026-03-01
  • Gemini 1.5 Flash — deprecated 2026-03-01
  • Gemini 1.5 Flash-8B — deprecated 2026-03-01

Frequently Asked Questions: Google API Pricing

What is the cheapest Google model?

The cheapest Google model by input token price is Gemini 2.0 Flash-Lite, priced at $0.08/M for input tokens and $0.30/M for output tokens. It is best suited for high-volume, cost-sensitive workloads where speed and price matter most.

What is the most capable Google model?

Gemini 3.1 Pro is Google's most capable model, classified as a premium-tier model. It supports a context window of 2,097,152 tokens and is priced at $2.00/M input / $12.00/M output per million tokens.

How does Google API pricing work?

Google charges per token consumed — separately for input tokens (the text you send) and output tokens (the text the model generates). Prices are listed in USD per 1 million tokens ($/M). Google was founded in 1998 and is headquartered in Mountain View, CA. All pricing shown here is sourced from the official Google pricing page.

Does Google offer batch API pricing?

Yes — 8 Google models support batch processing at reduced rates: Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro, and others. Batch API is ideal for asynchronous workloads like bulk data extraction, document processing, or offline classification pipelines. Batch pricing is shown in the pricing table above.

Which Google models support prompt caching?

7 Google models support prompt caching: Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, Gemini 2.0 Flash, Gemini 3.1 Flash-Lite. Prompt caching stores repeated context (system prompts, knowledge base content) in memory and applies a discount to cached input tokens, making it highly effective for applications that reuse the same context across many requests.

Which Google models support vision / image inputs?

12 Google models support vision (image input): Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, Gemini 2.0 Flash, Gemini 2.0 Flash-Lite, Gemini 3.1 Flash-Lite, Gemini 3.1 Flash Image Preview, Gemini 3.1 Pro Preview Custom Tools, Gemma 4 31B, Gemma 4 26B A4B . These models can analyze images, charts, screenshots, and documents alongside text prompts.

Google pricing data last verified: 2026-04-05. Prices may change — verify on the official Google pricing page before production use.