rgb-99/nlp-dataset-engine
61
High-performance streaming engine for processing terabyte-scale NLP datasets: lazy loading, auto-validation, language filtering, sharding, compression, and benchmarking.
What's novel
High-performance streaming engine for processing terabyte-scale NLP datasets: lazy loading, auto-validation, language filtering, sharding, compression, and benchmarking.
Score Breakdown
Innovation
3 (25%)
Craft
65 (35%)
Traction
6 (15%)
Scope
58 (25%)
Signal breakdown
Innovation
Not Fork+1
Code Novelty+0
Unique Niche+1
Concept Novelty+1
Craft
Ci+5
Tests+8
Polish+1
Releases+0
Has License+5
Code Quality+12
Readme Quality+15
Recent Activity+7
Structure Quality+5
Commit Consistency+2
Has Dependency Mgmt+5
Traction
Forks+0
Stars+6
Hn Points+0
Watchers+0
Early Traction+0
Devto Reactions+0
Community Contribs+0
Scope
Commits+7
Languages+3
Subsystems+10
Bloat Penalty+0
Completeness+7
Contributors+5
Authored Files+8
Readme Code Match+3
Architecture Depth+7
Implementation Depth+8
Evidence
Commits
32
Contributors
1
Files
38
Active weeks
3
TestsCI/CDREADMELicenseContributing
Repository
Language
Python
Stars
1
Forks
0
License
MIT