This week, Amazon is unveiling advanced AI technology, showcasing a new conversational voice model aimed at enhancing its competitive edge against offerings such as Gemini Live and OpenAI’s Advanced Voice Mode. In addition, the company has introduced updates to its video generation capabilities.
The newly launched Nova Sonic voice model is designed for real-time speech processing and AI-driven voice generation in conversational contexts, according to Amazon. The company claims that Nova Sonic employs a “unified model architecture,” which improves upon traditional methods that rely on multiple interconnected models for various tasks, including speech recognition and text-to-speech conversion. This approach supposedly allows Nova Sonic to detect nuances in tone more effectively and provide more natural-sounding responses.
Developers can experiment with Nova Sonic on Amazon’s Bedrock platform, which facilitates the creation of applications such as customer service chatbots and AI agents across diverse fields, including travel, education, and healthcare. Parts of the Nova Sonic technology are already integrated into Amazon’s Alexa Plus assistant, as noted by Rohit Prasad, Senior Vice President and Head Scientist of AGI at Amazon, in a recent interview with TechCrunch.
On the video front, Amazon has launched Nova Reel 1.1. This update reportedly improves quality and latency compared to its predecessor, Nova Reel 1.0. The new version is capable of maintaining stylistic consistency across several six-second clips, culminating in a cohesive video of up to two minutes.