kreuzberg-paddle-ocr
A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
- Latest release
- 15h ago
- Releases
- 64
- Known CVEs
- 0
- First release
- Feb 12, 2026
- License
- MIT
Repository
Source
- Stars
- 8.5k
- Forks
- 497
- Open issues
- 8
- Language
- Rust
- text-extraction
- document-intelligence
- metadata-extraction
- pdf-extraction
- pdfium
- python
- rag
- table-extraction
Security score
No OpenSSF Scorecard available for this repository.
Packages from this repo
Insights
Activity
- Total releases
- 64
- Last 12 months
- 64
- Cadence
- ~daily
- Dependencies
- 8
Releases per month
last 12 monthsRelease mix
- minor 6
- patch 40
- pre 17
64
releases
Releases
| Version | Released | |
|---|---|---|
5.0.0-rc.10
pre
| ||
5.0.0-rc.8
pre
| ||
5.0.0-rc.7
pre
| ||
5.0.0-rc.5
pre
| ||
5.0.0-rc.4
pre
| ||
4.9.9
patch
| ||
5.0.0-rc.3
pre
| ||
5.0.0-rc.2
pre
| ||
5.0.0-rc.1
pre
| ||
4.9.8
patch
| ||
4.9.7
patch
| ||
4.9.6
patch
| ||
4.10.0-rc.15
pre
| ||
4.10.0-rc.14
pre
| ||
4.10.0-rc.12
pre
| ||
4.10.0-rc.11
pre
| ||
4.10.0-rc.9
pre
| ||
4.10.0-rc.8
pre
| ||
4.10.0-rc.7
pre
| ||
4.10.0-rc.6
pre
|
1–20 of 64