At the I/O 2025 event on Tuesday, Google introduced its latest advancements in image and video generation artificial intelligence (AI) with the launch of Imagen 4 and Veo 3. These multimodal AI models come equipped with enhanced features that improve performance over previous versions. Imagen 4 boasts quicker generation times and better text rendering capabilities, while Veo 3 adds the ability to produce native audio, allowing it to include background sounds and dialogues in the videos it generates. The tech giant also unveiled Flow, a new AI-driven filmmaking application.
What’s New With Imagen 4 and Veo 3?
In a recent blog post, Google elaborated on the features of these new AI models. Imagen 4 debuts nearly a year after its predecessor was launched, following the release of Veo 2 and an updated Imagen 3 in December 2024.
The focus for Imagen 4 is on improving generation speed and accuracy. Like its predecessor, it supports both text and image inputs. The latest iteration enhances the ability to render fine details in images, such as intricate patterns, water droplets, and animal fur. Additionally, image generation occurs at a speed that surpasses earlier models.
Google highlights that Imagen 4 excels in both photorealism and abstract styles, capable of outputting images in various aspect ratios and resolutions up to 2K. Improvements in text rendering involve greater attention to spelling and typography, with enhanced contextual awareness regarding text placement, font size, and creative font styling choices.
Currently, Imagen 4 can be accessed through the Gemini app, Whisk, and Vertex AI, making it available to enterprises and in Workspace applications like Docs and Slides. It remains unclear if Google will expand its availability to all Gemini users or restrict access to paid subscribers. A more advanced version capable of generating images ten times faster than Imagen 3 is expected to launch later this year.
Veo 3, the latest iteration of Google’s video generation model, introduces native audio generation, enabling it to integrate ambient sounds, background noise, and dialogues in its output. A demonstration at the I/O 2025 event showcased two animated characters conversing with natural-sounding dialogue.
Moreover, Veo 3 enhances prompt adherence, practical physics simulation, and lip-syncing accuracy. Currently, it is accessible to Google AI Ultra subscribers in the United States via the Gemini app and the newly launched Flow app. Enterprises can utilize it through the Vertex AI platform.
Flow serves as an AI-driven filmmaking tool that harnesses the capabilities of Gemini, Imagen, and Veo models. Users can create video clips simply by describing them in natural language prompts, with the app generating an eight-second video accordingly. Known for its high prompt adherence, Flow generates consistent visuals including cast, locations, objects, and styles. It is available to subscribers of Google’s AI Pro and Ultra plans in the US.