doc-extraction
Multi-format document extraction library for EPUB, PDF, HTML, Markdown, and JSON documents
- Latest release
- Jan 27, 2026
- Releases
- 2
- Known CVEs
- 0
- First release
- Jan 13, 2026
- License
- MIT
Repository
Source
- Stars
- —
- Forks
- —
- Open issues
- —
Security score
No OpenSSF Scorecard available for this repository.
Packages from this repo
No other tracked packages from this repository.
Insights
Activity
- Total releases
- 2
- Last 12 months
- 2
- Cadence
- ~14 days
- Dependencies
- 23
Releases per month
last 12 monthsRelease mix
- major 1
2
releases
Dependencies
Depends on
2.5.0-
beautifulsoup4 >=4.12.0
-
boto3 >=1.34.0
-
botocore >=1.31.0
-
ebooklib >=0.18
-
extraction
-
lightgbm >=4.0.0
-
lxml >=4.9.0
-
numpy >=1.26.0
-
pdfplumber >=0.10.0
-
pillow >=10.0.0
1–10 of 23
Used by
Nothing tracked depends on this yet.
Releases