OpenAI Releases ChatGPT Images 2.0: The AI Revolution That Masters Spelling
The era of AI-generated images riddled with comical spelling errors appears to be over. OpenAI has just introduced ChatGPT Images 2.0, its latest image generation model that brings a massive leap in text accuracy and visual detail.
In just two years since DALL-E 3 dominated the market, generative AI technology has evolved from merely attempting to mimic visual patterns to systems capable of understanding language context within images. Whereas previous AI models struggled to spell items on a Mexican restaurant menu, ChatGPT Images 2.0 can now produce print-ready menu designs without glaring typographical errors.
Historically, AI image generators have relied on diffusion models. This technology works by reconstructing images from noise (visual disturbances). According to Asmelash Teka Hadgu, CEO of Lesan AI, text in images is often treated as very small pixel components, leading models to overlook letter details in favour of larger visual patterns.
However, researchers are now shifting to autoregressive models. Unlike diffusion models, these function more like Large Language Models (LLMs), making predictions about how an image should appear, including the arrangement of text characters within it.
OpenAI has revealed that the new model is equipped with thinking capabilities. This feature enables the AI to:
In addition, Images 2.0 boasts a much stronger understanding of non-Latin text, including Japanese, Korean, Hindi, and Bengali. With support for resolutions up to 2K, the model can handle subtle elements such as iconography, user interface (UI) components, and dense compositions that typically caused glitches in older AI models.
For developers, OpenAI has also released the gpt-image-2 API. Pricing for API usage will depend on the quality and resolution of the generated images. It should be noted that the model’s knowledge base has a cutoff date of December 2025, which may affect the accuracy of images related to very recent news events.