masa-57/PIC
Hierarchical image clustering API for product catalog images. Two-level clustering automatically organizes thousands of product images into meaningful groups
What's novel
Hierarchical image clustering API for product catalog images. Two-level clustering automatically organizes thousands of product images into meaningful groups
Code Analysis
9 files read · 3 roundsA production-grade hierarchical image clustering API that uses DINOv2 embeddings and HDBSCAN to group product catalog images into near-duplicate (L1) and semantically similar (L2) clusters, with pluggable storage backends, GPU-accelerated Modal workers, and full pipeline automation.
Strengths
Excellent separation of concerns with clear layering (api/core/models/services/worker), comprehensive test coverage including edge cases like empty datasets and noise handling, robust error handling with proper rollback strategies, well-documented architecture matching the README exactly, and thoughtful use of batching to avoid database bloat.
Weaknesses
Some type hints use Any which could be more specific, db.py file not accessible for review, and Modal integration adds deployment complexity that may be a hurdle for some users.
Score Breakdown
Signal breakdown
Innovation
Craft
Traction
Scope
Evidence
Commits
19
Contributors
2
Files
179
Active weeks
3
Repository
Language
Python
Stars
1
Forks
0
License
MIT