FabioLeitao/python3-lgpd-crawler
Dashboard com capacidade de sondagem, apreciação e mapeamento dos dados pessoais e/ou sensíveis
What's novel
Dashboard com capacidade de sondagem, apreciação e mapeamento dos dados pessoais e/ou sensíveis
Code Analysis
5 files read · 2 roundsA comprehensive data privacy compliance auditing tool that scans SQL databases, filesystems, APIs, and cloud services to detect PII/sensitive data using regex, ML (TF-IDF + RandomForest), and optional DL (sentence embeddings), storing findings in SQLite with tenant/operator metadata and generating E
Strengths
Excellent architecture with clear separation of concerns - connector pattern for multi-source scanning, unified detector combining regex/ML/DL approaches, robust config normalization, comprehensive security headers, and well-documented code. The project handles edge cases like lyrics/tabs false positives intelligently and supports extensive compliance frameworks (LGPD/GDPR/CCPA/HIPAA).
Weaknesses
Some error handling could be more explicit (e.g., silent exception swallowing in parallel scans), and while tests exist, they may not cover all edge cases for the ML/DL components. The optional dependencies create complexity for users who don't need advanced features.
Score Breakdown
Signal breakdown
Innovation
Craft
Traction
Scope
Evidence
Commits
234
Contributors
3
Files
186
Active weeks
5
Repository
Language
Python
Stars
1
Forks
1
License
BSD-3-Clause