ChatGPT acquires voice chat and can discuss images as part of significant upgrade from OpenAI, with which the company aims to dominate the industry
OpenAI has released long-awaited enhancements that will allow popular chatbot ChatGPT to interact with images and voices. This launch represents an important step towards OpenAI's vision of artificial general intelligence that can perceive and process information from a variety of modalities, not just text.
"We are starting to introduce new voice and image capabilities to ChatGPT. They offer a new, more intuitive type of interface, letting you have a voice conversation or show ChatGPT what you're talking about," OpenAI said in its official blog post.
OpenAI said the new ChatGPT-Plus will include voice chat powered by a new text-to-speech model capable of mimicking human voices, and the ability to discuss images thanks to integration with the company's image generation models. The new features appear to be part of what is known as GPT Vision (or GPT-V, which is often confused with the theoretical GPT-5) and represent key components of the enhanced multimodal version of GPT-4 that OpenAI announced earlier this year.
This enhancement comes just after OpenAI introduced DALL-E 3, its most advanced text-to-image generator. Declared "crazy" by early testers for its quality and accuracy, DALL-E 3 can create high-quality images from text cues by understanding complex context and concepts expressed in natural language. It will be built into ChatGPT Plus, a subscription service that offers ChatGPT powered by GPT-4.
The integration of DALL-E 3 and conversational voice chat signifies OpenAI's quest for artificial intelligence assistants that can perceive the world more like humans - with multiple senses. According to the company, "Voice and image give you more ways to use ChatGPT in your life. Take a picture of a landmark while you're traveling and have a live conversation about what's interesting about it."
Microsoft fuels AI race with OpenAI integration
OpenAI's biggest backer, Microsoft, is also looking to integrate OpenAI's advanced generative artificial intelligence capabilities into its own consumer products. At its recent fall event, Microsoft announced updates to Windows 11, Office, and the Bing search engine using models such as DALL-E 3 (in image-altering programs like Microsoft's revamped Paint) and Copilot, OpenAI's programming assistant.
This is in line with Microsoft's $10 billion investment in OpenAI, as it aims to become a leader in the race for artificial intelligence assistants. Copilot's debut on Windows 11 on September 26 promises to make the AI assistant available on all Microsoft platforms and devices. Meanwhile, Microsoft 365 Chat uses OpenAI's natural language to automate complex work tasks.
As previously reported by Decrypt, Microsoft said that "Microsoft 365 Chat searches your entire universe of data at work, including emails, meetings, chats, documents and more, as well as the web."
Cautious steps towards responsible artificial intelligence
However, OpenAI is aware of the potential risks with more powerful multimodal AI systems involving vision and voice generation. The main concerns relate to imitation, bias and dependence on visual interpretation.
"OpenAI's goal is to build AGI that is safe and useful," the company wrote in its announcement. "We believe in delivering our tools incrementally, allowing us to make improvements and refine risk mitigation over time while preparing everyone for more powerful systems in the future."