Google has announced enhancements to its video AI model, Veo 2, aimed at streamlining the creation of cinematic content and the editing of existing footage. These upgraded features are now available for preview on the Google Cloud’s Vertex AI platform, alongside improvements to its text-to-image generator, Imagen 3, and other audio AI tools.
Among the new functionalities for Veo 2 is inpainting, which allows users to automatically eliminate “unwanted background elements, logos, or distractions” from videos. Another feature, outpainting, enables users to extend the dimensions of a video, generating surrounding footage that seamlessly integrates with the original clip—similar to Adobe’s Generative Expand tool for images.
Additionally, users can now apply cinematic technique presets during video creation. These presets assist with directing elements such as shot composition, camera angles, and pacing. Presets include options like timelapse effects, drone-style perspectives, and simulating camera movement.
A newly introduced interpolation feature allows for the creation of transitions between two still images, effectively generating the frames needed to link the start and end points.
Adobe’s own Firefly video model features comparable capabilities, including a generative AI video extension option introduced in Premiere Pro recently. Additionally, Google now incorporates SynthID watermarks for digital attribution in its AI-generated media, akin to Adobe’s Content Credentials. However, Adobe claims its tools are commercially safe, trained exclusively on licensed and public domain materials, a strategy that Google cannot match, as its models are derived from diverse web sources.
Updates to Google’s Imagen 3 model have also been highlighted, with enhancements to automatic object removal processes that aim to produce more realistic outcomes when distracting elements are eliminated. Both Veo 2 and Imagen 3 are currently being utilized by brands such as L’Oreal and Kraft Heinz, with Kraft Heinz’s digital experience leader, Justin Thomas, stating that processes which once took eight weeks can now be completed in just eight hours.
In the arena of audio technology, Google has introduced its text-to-music model, Lyria, in a private preview, and launched a feature called “Instant Custom Voice” for its synthetic speech model, Chirp 3. This new capability allows Chirp 3 to create “realistic custom voices” based on just 10 seconds of audio input. Moreover, a new transcription tool is in preview, designed to clarify conversations by identifying and isolating individual speakers.
These announcements are part of a broader series of updates from Google. The latest iteration, Gemini 2.5 Flash, a more efficient version of the company’s Flash model, will soon be accessible on Vertex AI. Google states that Gemini 2.5 Flash can automatically adjust processing time according to task complexity, ensuring quicker results for simpler requests.
Additionally, Google is enhancing its enterprise-centric Agentic AI tools this week, enabling AI agents to communicate across different platforms such as PayPal and Salesforce. A new section on Google’s Cloud Marketplace will allow businesses to explore and purchase AI agents developed by third-party partners.