OpenAI’s big announcement Monday morning is GPT-4o, a faster, multimodal successor to its famed ChatGPT. The new model uses existing large language model (LLM) infrastructure to respond to audio, visuals, and text in real time, both verbally and on-screen. While ChatGPT Plus and Enterprise users will have full access to GPT-4o, Free users are beginning to receive limited access to the model this week.
Unlike ChatGPT, which processes text and voice command queries with roughly 3 seconds of latency, GPT-4o can reportedly process queries of all types (text, voice command, image, and video) with the speed of a typical human conversation. During Monday’s livestreamed announcement, OpenAI said users would be able to snap a photo of a landmark and chat verbally with GPT-4o about its history in real time—a capability that could give Ray-Ban Meta a run for its money—or capture whatever’s in the refrigerator to get dinner recipe suggestions. Users can also reportedly bring GPT-4o into a group conversation to settle a debate and prepare for job interviews by speaking with GPT-4o “face-to-face.”
OpenAI claims GPT-4o achieves GPT-4 Turbo performance but within half the time. When processing audio, it reportedly makes fewer errors than Whisper-v3, a pre-trained model for automatic speech recognition. It also translates audio faster and with a higher level of accuracy than Google’s AudioPalm-2 and Microsoft’s Gemini while “understanding” visuals more easily than GPT-4 Turbo, Gemini 1.0 Ultra, Gemini 1.5 Pro, and Anthropic’s Claude Opus.
Despite fairly widespread skepticism about the impact of AI chatbots on the environment, OpenAI didn’t share any data about GPT-4o’s sustainability (or lack thereof) when compared with its extant models. Last year, research revealed that generating just a few images with AI consumes as much energy as it takes to charge a smartphone. Researchers also found that for every five queries it responds to, ChatGPT uses the equivalent of a 16-ounce bottle of water.
OpenAI also denied rumors that it would unveil a new web search function this week. While all signs point to the eventual release of a GPT-powered search feature with source links, OpenAI might not be ready to show off its attempt at that particular corner of the internet.
ExtremeTech supports Group Black and its mission to increase greater diversity in media voices and media ownerships.
© 2001-2024 Ziff Davis, LLC., a Ziff Davis company. All Rights Reserved.
ExtremeTech is a federally registered trademark of Ziff Davis, LLC and may not be used by third parties without explicit permission. The display of third-party trademarks and trade names on this site does not necessarily indicate any affiliation or the endorsement of ExtremeTech. If you click an affiliate link and buy a product or service, we may be paid a fee by that merchant.
Tags debuts multimodal openai
Check Also
Could Gravitational Waves Be Detectable With a Single Atom?
A new paper from Stockholm University lays out an intriguing idea: What if the spontaneous …
