SolomonB14D3/confidence-cartography-toolkit

Teacher-forced confidence analysis for language models. pip install confidence-cartography

What's novel

Teacher-forced confidence analysis for language models. pip install confidence-cartography

Code Analysis

11 files read · 3 rounds

A research toolkit that measures language model confidence at the token level using teacher-forced analysis to detect false beliefs and hallucinations in generated text.

Strengths

Excellent separation of concerns with clean core library vs application layer. Implements a novel teacher-forced confidence scoring approach with multiple backends (HuggingFace, Ollama). Comprehensive benchmarking infrastructure for evaluating model honesty against human false-belief data.

Weaknesses

Limited test coverage - only basic unit tests exist without integration or stress testing. Some edge cases in chunking logic could be more robust. The Ollama backend relies on approximate methods that may not fully capture the intended confidence metrics.

Score Breakdown

Innovation

7 (25%)

Craft

79 (35%)

Traction

6 (15%)

Scope

75 (25%)

Signal breakdown

Innovation

Not Fork+1

Code Novelty+2

Concept Novelty+2

Craft

Ci+5

Tests+8

Polish+0

Releases+3

Has License+5

Code Quality+26

Readme Quality+15

Recent Activity+7

Structure Quality+5

Commit Consistency+0

Has Dependency Mgmt+5

Traction

Forks+0

Stars+6

Hn Points+0

Watchers+0

Early Traction+0

Devto Reactions+0

Community Contribs+0

Scope

Commits+3

Languages+5

Subsystems+13

Bloat Penalty+0

Completeness+7

Contributors+0

Authored Files+12

Readme Code Match+3

Architecture Depth+7

Implementation Depth+8

Evidence

Commits

Contributors

Files

Active weeks

TestsCI/CDREADMELicenseContributing

Repository

Language

Python

Stars

Forks

License

MIT