OpenAI Releases Three New Voice AI Models Capable of Real-Time Conversation Translation
The creators of the ChatGPT chatbot, OpenAI, have officially launched three new audio models to strengthen their voice-based artificial intelligence (AI) services. The three AI models are named GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. One of the most intriguing is GPT-Realtime-Translate, which will be able to translate conversations directly in a two-way manner. This model is supported by more than 70 input languages and 13 output languages, enabling two people to speak fluently in different languages. With this support, each user can communicate in their native language. The AI model will then translate it in real-time to the recipient. This technology is suitable for customer service, education, international events, media, and creator platforms. The Indian AI startup, BolnaAI, claims that this model has a 12.5 per cent lower Word Error Rate (WER) compared to other models they tested, particularly for languages such as Hindi, Tamil, and Telugu. For GPT-Realtime-Whisper, this speech-to-text AI model is said to be capable of transcribing speech in real-time with low latency or delay. OpenAI states that this technology can generate text directly as someone speaks, making it suitable for meeting captions, online classes, live broadcasts, and automatic note-taking. Additionally, this AI model can also be used to create automatic meeting summaries, assist customer service, healthcare workers, recruitment, and voice-based AI agents. This model is designed to handle complex conversations, understand longer contexts, and perform various tasks while keeping conversations natural. OpenAI has also increased the context window from 32K to 128K, allowing the AI to remember longer conversations and handle more complicated tasks.