Google ships Gemini 1.5 Flash-8B AI model

Smaller and faster variant of 1.5 Flash features half the price, twice the rate limits, and lower latency on small prompts compared to its forerunner.

Credit: Shutterstock

Google’s Gemini 1.5 Flash-8B AI model is now production-ready. The company said the stable release of Gemini 1.5 Flash-8B has the lowest cost per intelligence of any Gemini model.

Availability was announced October 3. Developers can access gemini-1.5-flash-8B for free via Google AI Studio and the Gemini API. Gemini 1.5 Flash-8B offers a 50% lower price compared to 1.5 Flash and twice the rate limits. Lower latency on small prompts also is featured.

An experimental version of Gemini 1.5 Flash-8B had been released in September as a smaller, faster variant of 1.5 Flash. Flash-8B nearly matches the performance of the 1.5 Flash model launched in May across multiple benchmarks and performs well on tasks such as chat, transcription, and long context language translation, Google said.

The stable release of Gemini 1.5 Flash-8B is priced at the following rates:

$0.0375 per 1 million input tokens on prompts < 128K
$0.15 per 1 million output tokens on prompts < 128K
$0.01 per 1 million tokens on cached prompts < 128K

Developers on the paid tier will be billed beginning October 14. The new price, along with work Google has done to drive down developer costs with the 1.5 Flash and 1.5 Pro models, show the company’s commitment to ensuring that developers have the freedom to build products and services that push the world forward, Google said.

Topics

About

Policies

Our Network

More

Google ships Gemini 1.5 Flash-8B AI model

Smaller and faster variant of 1.5 Flash features half the price, twice the rate limits, and lower latency on small prompts compared to its forerunner.

More from this author

Microsoft extends Entra ID to WSL, WinGet

F# 9 adds nullable reference types

Akka distributed computing platform adds Java SDK

Spin 3.0 supports polyglot development using Wasm components

JDK 24: The new features in Java 24

Rust Foundation moves forward on C++ and Rust interoperability

JetBrains IDEs ease debugging for Kubernetes apps

TypeScript 5.7 improves error reporting

Show me more

What is Rust? Safe, fast, and easy software development

Kotlin for Java developers: Classes and coroutines

Microsoft rebrands Azure AI Studio to Azure AI Foundry

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx

Google ships Gemini 1.5 Flash-8B AI model

Smaller and faster variant of 1.5 Flash features half the price, twice the rate limits, and lower latency on small prompts compared to its forerunner.

Related content

Azure AI Foundry tools for changes in AI applications

Microsoft unveils imaging APIs for Windows Copilot Runtime

A GRC framework for securing generative AI

Go language evolving for future hardware, AI workloads

More from this author

Microsoft extends Entra ID to WSL, WinGet

F# 9 adds nullable reference types

Akka distributed computing platform adds Java SDK

Spin 3.0 supports polyglot development using Wasm components

JDK 24: The new features in Java 24

Rust Foundation moves forward on C++ and Rust interoperability

JetBrains IDEs ease debugging for Kubernetes apps

TypeScript 5.7 improves error reporting

Show me more

What is Rust? Safe, fast, and easy software development

Kotlin for Java developers: Classes and coroutines

Microsoft rebrands Azure AI Studio to Azure AI Foundry

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx