On Monday, a fascinating demonstration showcased OpenAI CEO Sam Altman apparently sipping from an oversized mango-flavored juice box, amusingly noting that it was half his height. However, the twist was that this was not the real Altman speaking; rather, it was a sophisticated deepfake created through artificial intelligence.
What was particularly unsettling was the inability to discern the authenticity of the content.
On Tuesday, OpenAI officially introduced Sora 2, a groundbreaking system for generating AI-driven video and audio. During a briefing with journalists the day prior, company representatives described it as potentially representative of a “ChatGPT moment for video generation.” Sora 2, similar to ChatGPT, aims to provide users a platform to experiment with this new technology, featuring a social media app that enables the creation of realistic videos showing real individuals making real statements. Essentially, it’s a deliberate collection of deepfakes.
Concerns about deepfakes are warranted
OpenAI believes Sora, initially mentioned in February 2024 and launched in December, has now achieved a level of dependability. Bill Peebles, the head of Sora, likened the earlier version of the video generation system to a “slot machine,” where users would input a prompt and hope for relevant results. “The new model,” he stated, “is much more accurate in following user prompts.”
During the briefing, the Sora 2 development team revealed they had been refining the system for over 20 months. A significant advancement is its capability to produce audio that is synchronized with video, encompassing not just background sounds, but also dialogue in multiple languages. Users can access it through Sora.com, with Sora 2 Pro becoming available for ChatGPT Pro members, and developers will soon get API access.
The corresponding social media application, also named “Sora,” is currently available on iOS for users in the US and Canada via an invitation-only model. Other countries will follow, with each participant granted four additional invites to share.
In their release, OpenAI stated that Sora 2 is a move toward enhancing real-world simulations. Employees emphasized that the new system has improved its understanding of physics. Peebles explained the capability to realistically depict actions like backflips on paddleboards, accurately modeling fluid dynamics and buoyancy, marking a significant advancement in the underlying physics comprehension.
However, this enhancement raises serious concerns surrounding deepfakes, which have become increasingly common.
The Sora social app mirrors features found in TikTok, such as a “For You” feed and a vertical scrolling interface. A unique “Cameos” feature allows users to authorize the app to create videos using their likenesses. To do this, users must record specific movements and vocalizations within the app. Once uploaded, their image can be remixed with others based on verbal requests.
OpenAI staff mentioned during the briefing that they have transitioned to using Sora as their primary communication tool, replacing text messages and voice notes. They showcased various fake advertisements, simulated conversations, and fabricated news clips produced with Sora 2 in a scrollable format on the app.
The Sora app is designed to drive social media trends
Some clips were generated live during the demonstration, displaying a striking level of realism, with no noticeable glitches. Unless the content contained overtly fantastical elements, like the oversized juice box, videos appeared convincingly genuine to an untrained eye, often leaving viewers with only an inexplicable sense that something was amiss.
The Sora app provides users with options to control who can create Cameos featuring their likeness, allowing settings to limit access to themselves, approved contacts, mutual friends, or everyone. OpenAI team members stated that users are considered “co-owners” of the created Cameos, with the ability to revoke access or delete their likeness from a video whenever desired. Users can also block others and are given the capability to review drafts of Cameos featuring them prior to publication.
OpenAI also introduced newly implemented parental controls in their offerings, providing features like enabling a non-personalized content feed, controlling teen access to direct messages, and the option to limit an endless scrolling experience.
Much like TikTok, the Sora app aims to cultivate viral social media trends with a “Remix” function for other users’ videos. Currently, users can create 10-second clips, though Pro subscribers are expected to receive an extension to 15 seconds soon on the web, with mobile enhancements to follow. While longer video creations are feasible, the company is still determining the best way to handle this resource-intensive task.
For the wider audience, the crucial challenge with Sora 2 and the accompanying app may lie in differentiating between what is genuine and what is artificial. OpenAI emphasized that each video created with Sora includes indicators of its AI-generated nature, such as metadata and a visible watermark on downloaded videos, alongside unspecified internal tools for detection. Nonetheless, history suggests the reality of potential circumventions, raising concerns over the rapid spread of misinformation.
Regarding deepfake representations of celebrities and other public figures, OpenAI clarified that such videos can only be generated if the individuals have uploaded their own cameos and consented to their use. The same rules apply broadly, meaning individuals’ likenesses cannot be utilized unless they have willingly contributed. OpenAI representatives assured that it is “impossible” to generate explicit or “extreme” content through the platform, and that ongoing moderation exists to address potential policy breaches and copyright concerns.
Past experiences show that individuals can find ways around such restrictions. A Microsoft engineer previously warned that AI image generators have created inappropriate content through loopholes. Similarly, xAI’s Grok generated inappropriate deepfakes with minimal prompting. OpenAI staff expressed that current limitations on public figure representations are specific to “this rollout,” indicating that future capabilities may not be similarly restricted.
On Monday, The Wall Street Journal reported that the Sora system would include copyrighted material by default, unless rights holders proactively choose to opt out. When asked about this during the briefing, OpenAI representatives redirected to existing image-generation policies and indicated that the Sora approach would similarly follow those guidelines. They mentioned that some opt-out provisions would carry over and that additional controls are forthcoming.