WhatsApp is reportedly developing a new artificial intelligence (AI) feature that will enable users to engage in hands-free verbal dialogues with Meta AI, the chatbot integrated within the app. Previously, it was suggested that WhatsApp was focusing on allowing users to send voice messages to Meta AI, facilitating one-way communication; however, recent developments indicate that the AI will also be capable of responding verbally.
The anticipated voice mode feature could come equipped with various voice options, although specific details on those variations remain unclear.
According to a report by WABetaInfo, a tracker dedicated to WhatsApp features, the voice mode capability for Meta AI was discovered in the beta version for Android, specifically version 2.24.17.16. A similar observation was made for the iOS platform, where the feature appeared in WhatsApp beta version 24.16.10.70, as noted in a separate post.
At present, this feature does not appear in the beta versions of either app, which suggests that the company is still in the developmental phase. Consequently, users participating in the Google Play Beta program will not be able to test the Meta AI voice mode just yet.
Screenshots shared by WABetaInfo depict a new voice icon, represented by an audio waveform, located next to the text input field in the Meta AI chat interface. Clicking on this icon appears to trigger a bottom sheet featuring “Meta AI” at the top, with a circular arrangement of bubbles in the center, and the text “Hi, how can I help” at the bottom, accompanied by a visual indicator that suggests the AI is actively listening.
Additional screenshots indicate that users might be offered up to ten different voice options within the Meta AI voice mode. While the specific attributes of these voices are not yet known, they may differ in terms of accents, energy levels, or tonalities. It is unlikely that these voices will support multiple languages.
Moreover, there seems to be an option for enabling captions and transcriptions through text-to-speech functionality. This feature is likely designed to document the entire verbal interaction, converting it into text for later reference. The timeline for the public rollout of this new feature remains unknown.