On Monday, Hume, an artificial intelligence (AI) company based in New York, introduced a novel tool that enables users to personalize AI voices. Named Voice Control, this new feature is designed to assist developers in integrating customized voices into chatbots and other AI-driven applications. Rather than providing a wide selection of voices, Hume offers detailed control across ten distinct voice dimensions, allowing users to create unique vocal profiles tailored to their applications.
In a recent blog post, the company elaborated on the objectives of this tool. Hume aims to address the challenge organizations face in selecting the appropriate AI voice that aligns with their brand identity. The Voice Control feature permits users to modify various aspects of vocal perception, enabling developers to craft more assertive, relaxed, or upbeat voices for their AI applications.
The Voice Control tool is currently in beta and available for all registered users of the platform. Members of the Gadgets 360 team were able to access and evaluate its capabilities. Developers can adjust ten dimensions, including gender, assertiveness, buoyancy, confidence, enthusiasm, nasality, relaxedness, smoothness, tepidity, and tightness.
Instead of relying on a prompt-driven customization method, Hume has implemented a slider system that ranges from -100 to +100 for each dimension. The company expressed that this design choice aims to clarify the previously ambiguous textual descriptions of voice and to facilitate more precise control over the voice’s characteristics.
During our evaluation, we discovered that modifying any of the ten dimensions produced a noticeable change in the AI voice, and the tool effectively distinguished between the various characteristics. Hume attributed this capability to a new “unsupervised approach” that retains key traits of the original voice while adjusting specific parameters. However, details regarding the sources of the data used for this feature were not disclosed by the company.
Following the creation of a bespoke AI voice, developers must implement it into their applications through the configuration of the Empathic Voice Interface (EVI) AI model. Although not explicitly stated, it is likely that the EVI-2 model was employed for this experimental functionality.
Looking ahead, Hume intends to broaden the variety of base voices available, introduce new interpretable voice dimensions, improve the retention of voice characteristics during significant alterations, and develop sophisticated tools for analyzing and visualizing voice attributes.