Mistral OCR 4 adds bounding boxes, block classification for RAG
The compact model runs in a single container, supports 170 languages, and integrates with Mistral's new Search Toolkit for enterprise document pipelines.
Mistral ships structured document extraction for enterprise AI
Mistral AI released OCR 4 on June 23, 2026, adding structured output capabilities that go beyond text extraction to support retrieval-augmented generation and agentic workflows. The model returns bounding boxes, block-type classification, and per-word confidence scores alongside extracted text, according to details first reported by Mistral AI.
OCR 4 supports 170 languages across 10 language groups and runs in a single container for fully self-hosted deployments. The model is priced at $4 per 1,000 pages via API, or $2 per 1,000 pages through the Batch API.
Why it matters
Document understanding has been a text-extraction problem. OCR 4 reframes it as a structured-data problem. By returning not just what a document says but where each element sits and what role it plays, the model enables citation-grounded RAG systems, compliance workflows that need human verification loops, and agents that can act on documents rather than simply read them. For enterprises processing high volumes of invoices, contracts, or technical reports, the combination of structured output and self-hosting addresses both accuracy and data-sovereignty requirements.
Performance and benchmark results
In head-to-head human evaluations, independent annotators preferred OCR 4 over competing OCR and document-AI systems in 72% of comparisons on average. The evaluation used more than 600 documents across 12+ languages sourced from third-party vendors.
OCR 4 achieved the top score among tested models on OlmOCRBench at 85.20 and scored 93.07 on OmniDocBench. Mistral noted that both benchmarks have known scoring limitations, including ground-truth errors, equivalent math notation counted as mismatches, and multi-column reading-order artifacts that penalize correct output.
On Mistral's internal Crawl Multilingual evaluation, OCR 4 scored 0.98 and led across all eight language groups tested, with the widest performance gap on rare and low-resource languages where competing systems degrade.
Aidan Donohue, an AI engineer at Rogo, reported that OCR 4 reached equivalent accuracy to leading agentic document parsers on a chart-dense financial QA dataset at roughly 8x lower cost and 17x lower latency.
Integration with Search Toolkit
OCR 4 serves as an ingestion component for Mistral Search Toolkit, an open-source, composable search framework announced at the AI Now Summit. The model's structured output supplies citation-ready inputs to the toolkit's ingestion, retrieval, and evaluation workflow for RAG and enterprise search.
The model accepts PDF, DOC, PPT, and OpenDocument formats. Each extracted block includes a bounding box, type classification (titles, tables, equations, signatures), and confidence scores that enable downstream systems to perform semantic chunking, form filling, invoice processing, and compliance checks.
Document AI layer
Mistral offers Document AI capabilities through the same API endpoint. By passing a JSON schema alongside a document, users can reshape OCR output into structured formats or annotate detected images with domain-specific fields. Document AI is priced at $5 per 1,000 pages.
OCR 4 and Document AI are available through Mistral Studio, Amazon SageMaker, Microsoft Foundry, and will be available soon on Snowflake Parse Document. Enterprise customers can deploy OCR 4 on their own infrastructure for data-residency and compliance requirements.
Mistral AI first reported these details in a June 23, 2026 announcement.
This is an original analysis by the Omega editorial team. Source reporting: AI Watch.
Want systems like this working for your business?
Book a Call
