On Thursday, Mistral unveiled its Mistral Optical Character Recognition (OCR) application programming interface (API). This advanced artificial intelligence (AI) model can analyze and process PDF documents, transforming them into formats suitable for AI, such as Markdown or raw text files. This functionality allows for effective data extraction from PDFs, making the information accessible for AI models. According to the company based in Paris, the Mistral OCR API is designed to enable developers to create AI applications focusing on PDF files and generate datasets for training new AI models.
Mistral OCR API Introduced
PDFs have long presented challenges for AI initiatives. The contents of these files are not readily accessible to large language models (LLMs) using conventional Retrieval-Augmented Generation (RAG) methods. As a result, AI applications often struggle to scan PDF documents for specific information.
This limitation can hinder developers from providing PDF analysis features in their AI applications. Although tools like Google’s NotebookLM and Adobe’s AI assistant utilize specialized OCR technology to tackle this issue, many developers in the open-source community lack access to a high-efficiency solution.
The Mistral OCR API addresses this gap by facilitating the extraction of data from PDFs into formats that are ready for AI processing. A recent newsroom announcement claims that the tool can accurately comprehend various document elements, such as text, media, tables, and equations. Once processed, this information can be presented in either Markdown or raw text file formats.
This extracted text serves as input for AI models, allowing RAG systems to access the information and respond to queries effectively. The company’s announcement highlighted that “Mistral OCR excels in understanding complex document elements, including interleaved imagery, mathematical expressions, tables, and advanced layouts such as LaTeX formatting. The model enables deeper understanding of rich documents such as scientific papers with charts, graphs, equations, and figures.”
Mistral asserts that its OCR technology can process as many as 2,000 pages per minute on a single node. Additionally, the API empowers developers to use documents as prompts and combine outputs to create functional tools and AI agents.
In internal evaluations, Mistral OCR outperformed competitors, including Google Document AI, Azure OCR, and the 2024-11-20 version of GPT-4 for “text-only” documents. It also demonstrated superior multilingual capabilities compared to Google and Azure.
Individuals interested in exploring the capabilities of the model can find it on Mistral’s Le Chat platform, with access to the API available through la Plateforme.