dolma
Toolkit for pre-processing LLM training data.
- Latest release
- Jul 07, 2025
- Releases
- 41
- Known CVEs
- 0
- First release
- Jul 09, 2023
- License
- Apache-2.0
Repository
Source
- Stars
- —
- Forks
- —
- Open issues
- —
Security score
No OpenSSF Scorecard available for this repository.
Packages from this repo
Insights
Activity
- Total releases
- 41
- Last 12 months
- 1
- Cadence
- ~4 days
- Dependencies
- 52
Releases per month
last 12 monthsRelease mix
- major 1
- minor 5
- patch 25
- pre 9
41
releases
Dependencies
Depends on
1.2.1-
anyascii >=0.3.2
-
beautifulsoup4 >=4
-
black >=22.6.0
-
blingfire ==0.1.8
-
boto3
-
brotli
-
cchardet >=2.1.7
-
charset-normalizer >=3.2.0
-
detect-secrets ==1.4.0
-
dolma
1–10 of 52
Used by
1Releases
| Version | Released | |
|---|---|---|
1.2.1
patch
| ||
1.2.0
minor
| ||
1.2.0.dev7
pre
| ||
1.1.2
patch
| ||
1.1.1.post3
pre
| ||
1.2.0.dev6
pre
| ||
1.2.0.dev5
pre
| ||
1.2.0.dev3
pre
| ||
1.1.1
patch
| ||
1.2.0.dev2
pre
| ||
1.2.0.dev1
pre
| ||
1.2.0.dev0
pre
| ||
1.1.0
minor
| ||
1.0.14.post1
pre
| ||
1.0.13
patch
| ||
1.0.12
patch
| ||
1.0.11
patch
| ||
1.0.10
patch
| ||
1.0.9
patch
| ||
1.0.8
patch
|
1–20 of 41