Google Launches Gemini 3.1 Flash TTS, an AI for Creating Natural Voices That Can Speak 70 Languages

Thu, 16 Apr 2026, 17:07 WIB | By Yudha Pratomo | Source: KOMPAS | Technology

Google has launched its latest AI text-to-speech (TTS) model, Gemini 3.1 Flash TTS, on Wednesday (15/4/2026). This AI model is claimed to deliver more natural and expressive voices.

The model is part of the development of the Gemini 3.1 family and is designed to generate AI voices that sound more like humans.

One of its main advantages is support for more than 70 languages, as well as the ability to handle conversations with more than one speaker (multi-speaker).

One of the interesting features in Gemini 3.1 Flash TTS is Audio Tags. This feature allows users to adjust the way the AI speaks more flexibly.

Users can also give instructions such as asking the AI to speak in an “enthusiastic”, “happy”, or “serious and informative” tone.

Not only that, Google also provides various voice style and accent options.

Users can customise the voice according to their needs, from casual styles like podcasts, audiobook narration, to formal styles like news anchors. The available accents are diverse, such as British and American.

With these features, the generated voice can be adapted to various needs, from casual narration to formal dialogue.

As mentioned earlier, Gemini 3.1 Flash TTS comes with support for more than 70 languages, including various regional variations. Languages such as Indonesian, Japanese, German, to Hindi can already be spoken fluently by this AI.

In testing by Artificial Analysis, the model recorded an Elo score of 1,211 and was rated superior in the quality-to-cost ratio. Gemini 3.1 Flash TTS is said to surpass the quality of ElevenLabs v3 and is slightly below Inworld 1.5 Max.

For usage, Google provides a free version of this model. However, data from free users will be used for product development.

Meanwhile, for the paid version, the rate charged is US$1 per million tokens for text input and US$20 per million tokens for audio output.

Gemini 3.1 Flash TTS is currently available in preview stage through the Gemini API, Vertex AI for enterprise users, and Google Vids for Workspace users. In addition, general users can also try this feature for free through Google AI Studio.

To ensure transparency, audio generated by this model will be given a digital watermark using Google’s SynthID technology. This watermark indicates that the content was created by artificial intelligence.

This mark is embedded directly in the audio file, but cannot be heard by humans. Nevertheless, computer systems can still recognise it as AI-generated content, not genuine human voices.

View JSON | Print