Alibaba Releases Qwen 3.5 Omni, an AI Model Capable of Cloning Users' Voices

Wed, 01 Apr 2026, 08:03 WIB | By Wahyunanda Kusuma Pertiwi | Source: KOMPAS | Technology

Alibaba’s Qwen team from the Chinese technology company has released its latest artificial intelligence (AI) model, Qwen 3.5 Omni, on Sunday (29/3/2026). This model represents one of Alibaba’s most ambitious updates, introducing a new version of an “omnimodal” AI that can process various types of input simultaneously, from text and images to audio and video. One of the most prominent features of Qwen 3.5 Omni is its voice cloning capability. With this feature, users can upload a voice sample, and the AI will respond using that voice. Regarding voice, Qwen 3.5 Omni is also equipped with other intelligent capabilities for real-time direct interaction in voice conversations. The model includes a “semantic interruption” feature, which allows the AI to understand when a user truly wants to “interrupt” the conversation. In simple terms, this AI will not stop speaking just because of minor disturbances, such as background noise or short responses like “yes” or “uh” and “hmm”. This makes the conversation flow feel more natural. This technology is claimed to help reduce pronunciation errors, especially for numbers or uncommon words. ARIA also enables dynamic synchronisation between text and voice, making the output more natural and accurate. This approach is considered to make processing faster and results more consistent compared to multimodal methods that combine separate models, such as ChatGPT. Qwen 3.5 Omni itself was trained using more than 100 million hours of audiovisual data. The scale of this training places its capabilities in a different class from competitors’ AI models. To demonstrate its abilities, Qwen 3.5 Omni was compared with ChatGPT 5.4 (thinking mode), with both tested using the same YouTube Short video.

View JSON | Print