At first glance, the scene appears unsettling. A body lies on the street, obscured by a disturbingly pristine white sheet, while officers move about with an eerie lack of urgency. “We need to clear the street,” communicates one officer through hand gestures, her lips remaining still. This is the work of an AI, specifically Google’s new video generation model, Veo 3. Intriguingly, my initial prompt included no dialogue.
Veo 3 autonomously generated that line. Over the past day, I have produced a dozen clips featuring news reports, disaster scenarios, and whimsical cartoon cats, all accompanied by realistic audio—some of which was created by the AI itself. This unexpected sophistication is a bit unsettling. While I don’t anticipate it leading us to a misinformation crisis imminently, Veo 3 definitely operates as a powerful AI content generator.
Veo 3 was launched by Google during its recent I/O event, showcasing its standout feature: the capability to generate accompanying sound for AI-created videos. Josh Woodward, the Vice President of Gemini, declared this development a step into a “new era of creation” during the keynote presentation. Skeptical at first, my doubts diminished after I asked Veo 3 to generate a clip of a news anchor reporting a fire at the Space Needle. With just a simple text prompt and an active subscription to Google’s AI Ultra plan, I witnessed the impressive realism firsthand.
My interest piqued after observing the capabilities displayed by Alejandra Caraballo, a Harvard Law School Cyberlaw Clinic instructor. In one of her compelling examples, a news anchor reported the fictional demise of US Secretary of Defense Pete Hegseth, despite him being very much alive. A related post showcased a series of AI-generated videos, including scenes of disasters and a patient in a hospital, which garnered significant attention on Reddit. These disturbing clips featured realistic dialogue and ambient sound effects.
After experimenting with Veo 3, my initial fears seemed somewhat alleviated. The system has clear guardrails: prompts for extreme or violent scenarios, like announcing a political figure’s death or creating mock scenarios involving them, were not permitted. This is a promising start for responsible usage.
However, it’s important to note that potentially distressing scenarios can still be generated. For instance, I successfully prompted Veo 3 to create a video featuring the Space Needle engulfed in flames. Using my own photograph of Mount Rainier, I generated a scene with smoke and lava. Pairing it with commentary from a news anchor could easily lead to misleading implications.
On a positive note, Veo 3 does not appear to be a straightforward tool for creating deepfakes. I attempted to generate a video using my photographs, asking it to include specific lines of dialogue, but it did not comply. I also experimented with making animated boots walk off a photo, successfully generating one boot’s movement with amusing sound effects.
Generating videos proved easier with less specific prompts. My colleague Andrew Marino noted that Veo 3 excels at producing basic, formulaic YouTube content aimed at children.
For those unfamiliar with children’s content on YouTube, it often consists of repetitive and simplistic animations designed to capture young viewers. Picture poorly rendered monster trucks crashing into different colored pools of paint repeatedly, clocking in hours of similar content that serves to entertain toddlers without educational value. In just about 10 minutes, I created a video following this simplistic formula, complete with cheerful music. But even more concerning was a clip featuring two cartoon cats on a pier.
I crafted a humorous scene where the two cats lamented the lack of fish biting. Within minutes, I had a 10-second clip with the AI generating dialogue for me. If creating such a short clip is this effortless, developing a longer video would be straightforward. As it stands, longer clips revert to Veo 2 features, stripping audio enhancements, but with Google’s rapid development, full-length video editing may not be far off.
It raises the question of whether this kind of content generation is a beneficial feature or a potential flaw. Google presented various impressive AI-generated clips from established filmmakers, including Eliza McNitt, who collaborates with Darren Aronofsky on a project incorporating AI elements. While AI-generated video could serve as a useful tool in skilled hands, the predominant outcome could very well be a surge in insipid content that AI produces efficiently—now in stereo.