selimozten/lale
Turkish instruct model distilled from Claude Opus 4.6 outputs, fine-tuned with Unsloth
What's novel
Turkish instruct model distilled from Claude Opus 4.6 outputs, fine-tuned with Unsloth
Code Analysis
10 files read · 4 roundsA complete MLOps pipeline that generates Turkish instruction data via AWS Bedrock (Claude Opus), filters it for quality and deduplication, fine-tunes base models with Unsloth QLoRA, and evaluates on the Terazi benchmark.
Strengths
Well-organized CLI structure with clean separation of concerns, solid filtering pipeline with multi-stage quality checks, proper Unsloth/QLoRA integration, and comprehensive README that accurately reflects implementation. Uses modern tools effectively (Pydantic, Click, WandB).
Weaknesses
Performance issues with boto3 client creation in data generation, memory concerns in fuzzy dedup at scale, incomplete test coverage missing integration tests, no data preprocessing pipeline before training.
Score Breakdown
Signal breakdown
Innovation
Craft
Traction
Scope
Evidence
Commits
26
Contributors
1
Files
35
Active weeks
1
Repository
Language
Python
Stars
1
Forks
0
License
Apache-2.0