{
    "success": true,
    "data": {
        "id": 1681736,
        "msgid": "google-launches-gemini-3-1-flash-tts-an-ai-for-creating-natural-voices-that-can-speak-70-languages-1776336725",
        "date": "2026-04-16 17:07:00",
        "title": "Google Launches Gemini 3.1 Flash TTS, an AI for Creating Natural Voices That Can Speak 70 Languages",
        "author": "Yudha Pratomo",
        "source": "KOMPAS",
        "tags": "",
        "topic": "Technology",
        "summary": "Google has unveiled Gemini 3.1 Flash TTS, its latest AI text-to-speech model, which produces more natural and expressive voices supporting over 70 languages and multi-speaker conversations. Key features include Audio Tags for customising tone and style, such as enthusiastic or formal delivery, alongside various accents for applications like podcasts and news broadcasts. Available in preview via APIs and free for general users, the model incorporates SynthID watermarking for transparency, positioning it as a cost-effective leader in AI audio generation.",
        "content": "<p>Google has launched its latest AI text-to-speech (TTS) model, Gemini\n3.1 Flash TTS, on Wednesday (15\/4\/2026). This AI model is claimed to\ndeliver more natural and expressive voices.<\/p>\n<p>The model is part of the development of the Gemini 3.1 family and is\ndesigned to generate AI voices that sound more like humans.<\/p>\n<p>One of its main advantages is support for more than 70 languages, as\nwell as the ability to handle conversations with more than one speaker\n(multi-speaker).<\/p>\n<p>One of the interesting features in Gemini 3.1 Flash TTS is Audio\nTags. This feature allows users to adjust the way the AI speaks more\nflexibly.<\/p>\n<p>Users can also give instructions such as asking the AI to speak in an\n\u201centhusiastic\u201d, \u201chappy\u201d, or \u201cserious and informative\u201d tone.<\/p>\n<p>Not only that, Google also provides various voice style and accent\noptions.<\/p>\n<p>Users can customise the voice according to their needs, from casual\nstyles like podcasts, audiobook narration, to formal styles like news\nanchors. The available accents are diverse, such as British and\nAmerican.<\/p>\n<p>With these features, the generated voice can be adapted to various\nneeds, from casual narration to formal dialogue.<\/p>\n<p>As mentioned earlier, Gemini 3.1 Flash TTS comes with support for\nmore than 70 languages, including various regional variations. Languages\nsuch as Indonesian, Japanese, German, to Hindi can already be spoken\nfluently by this AI.<\/p>\n<p>In testing by Artificial Analysis, the model recorded an Elo score of\n1,211 and was rated superior in the quality-to-cost ratio. Gemini 3.1\nFlash TTS is said to surpass the quality of ElevenLabs v3 and is\nslightly below Inworld 1.5 Max.<\/p>\n<p>For usage, Google provides a free version of this model. However,\ndata from free users will be used for product development.<\/p>\n<p>Meanwhile, for the paid version, the rate charged is US$1 per million\ntokens for text input and US$20 per million tokens for audio output.<\/p>\n<p>Gemini 3.1 Flash TTS is currently available in preview stage through\nthe Gemini API, Vertex AI for enterprise users, and Google Vids for\nWorkspace users. In addition, general users can also try this feature\nfor free through Google AI Studio.<\/p>\n<p>To ensure transparency, audio generated by this model will be given a\ndigital watermark using Google\u2019s SynthID technology. This watermark\nindicates that the content was created by artificial intelligence.<\/p>\n<p>This mark is embedded directly in the audio file, but cannot be heard\nby humans. Nevertheless, computer systems can still recognise it as\nAI-generated content, not genuine human voices.<\/p>",
        "url": "https:\/\/jawawa.id\/newsitem\/google-launches-gemini-3-1-flash-tts-an-ai-for-creating-natural-voices-that-can-speak-70-languages-1776336725",
        "image": ""
    },
    "sponsor": "Okusi Associates",
    "sponsor_url": "https:\/\/okusiassociates.com"
}