Google AI Blog

New ways to balance cost and reliability in the Gemini API

Back to overview

Google launches two new inference tiers for the Gemini API: Flex and Priority. Flex offers cost-effective processing with variable latency, ideal for non-urgent tasks. Priority ensures low latency for time-sensitive applications. This dual-tier approach lets developers balance budget constraints with performance requirements, improving API accessibility across different use cases and application types.