On Sunday, Meta introduced a new open-source artificial intelligence (AI) tool aimed at competing with Google’s NotebookLM. Named NotebookLlama, this innovative application functions as an AI-driven podcast generator, enabling users to convert PDF files into audio podcasts featuring two AI characters. The tool employs three distinct Llama 3.1 AI models to facilitate the conversion process. Similar to Google’s offering, NotebookLlama’s podcasts are characterized by a conversational format between the two AI hosts, delivering a fluid dialogue.
The NotebookLlama tool utilizes three large language models to transform blocks of text into audio podcasts. Presently, users can only upload PDF files, necessitating that any other text formats must first be converted into PDF.
Meta NotebookLlama workflow
Photo Credit: Meta
NotebookLlama begins with the Llama 3.2 1B instruct model, which preprocesses the PDF file and converts it into a ‘.txt’ format. The next step involves the Llama 3.1 70B instruct model, responsible for generating the podcast transcript based on the original dataset. The transcript is then enhanced through a rewriter that employs the Llama 3.1 8B instruct model. To finalize the process, a custom tool integrates the transcript into a text-to-speech workflow, utilizing Meta’s Parler TTS tool. Interested users can find all the necessary models for podcast creation on GitHub here.
It is important to note that the AI models outlined are recommendations from the developers. Users have the option to use smaller models for each step, although the outcomes may differ significantly. Meta indicates that users will need a GPU with an aggregate memory of about 140GB to run the AI system effectively under the recommended setup.
An X (formerly known as Twitter) user shared a sample of a generated podcast, revealing that the audio quality does not quite measure up to that of Google NotebookLM. The output exhibited a shrill and robotic tone, with noticeable instances of audio being skipped and AI hosts overlapping in conversation.
Meta has acknowledged these shortcomings and is committed to enhancing the tool’s performance in future versions. The company stated, “The TTS model is the limitation of how natural this will sound. This will likely improve with a better pipeline and support from someone more knowledgeable.”
Looking ahead, the tech giant plans to implement two different LLMs for scriptwriting, allowing each model to present arguments that create a more engaging conversational style for the podcasts. Additionally, Meta is exploring the Llama 405B AI model for transcript writing and aims to expand compatibility with various input and output formats.