Part of CNS 8.0 / Grounded Dialectical Orthesis

13 — Metrics and Acceptance Criteria

13 — Metrics and Acceptance Criteria

Core metrics

SNO validity rate

Fraction of outputs that parse into valid SNO-8 schema.

Target MVP: ≥ 95%.

Citation validity

Fraction of cited evidence references that resolve to known evidence atoms.

Target strict claims: 100%.

Mean entailment

Mean NLI/evidence support score for strict and likely claims.

Target strict claims: ≥ 0.75 in MVP, domain-adjusted later.

Zero-Temperature Hallucination Rate

$$ ZTHR = \frac{ \#\text{strict promoted claims without valid proof trace} }{ \#\text{strict promoted claims} } $$

Target: 0.

Evidential Entanglement

Weighted evidence overlap between SNOs.

Used for pair selection, not final truth.

Chiral tension

Combination of graph, evidence-polarity, and language–logic chirality.

Productive Conflict Precision@K

Fraction of top-K selected SNO pairs that yield either:

  • accepted synthesis;
  • useful latent predicate;
  • explicitly preserved unresolved contradiction.

Residual energy

Unresolved support/refute contradiction mass after proof closure and accepted predicates.

Predicate-Invention Utility

$$ PIU = \frac{\Delta ResidualEnergy}{1 + PredicateComplexity} $$

False Predicate Rate

Accepted latent predicates that fail grounding or do not generalize.

Orthesis convergence

Fraction of synthesized SNOs satisfying round-trip and proof criteria.

Round-trip residual

$$ \chi_{LL}=\|G(S(T))-T\|_\Omega $$

Beta-1 reduction

$$ \Delta \beta_1 = \beta_1(G_{input}) - \beta_1(G_{synth}) $$

CNS should not force cycles to zero when the contradiction is real. Preserved contradictions must be explicit.

Calibration ECE

Expected calibration error for likely claims.

Acceptance bands

MetricMVPResearch target
SNO validity≥95%≥98%
citation validity strict100%100%
ZTHR strict00
mean entailment strict≥0.75≥0.85
pair-selection P@10≥0.60≥0.80
latent recovery F1 synthetic≥0.60≥0.85
orthesis convergence≥0.40≥0.70
ECE likely claims≤0.15≤0.08

Report categories

Final outputs must separate:

  • strict;
  • likely;
  • hypothesis;
  • unresolved;
  • rejected.

Do not collapse these into one confidence score.

Failure taxonomy

  • citation hallucination;
  • weak entailment;
  • unsupported synthesis;
  • predicate overfit;
  • access-state misuse;
  • hidden oracle leakage;
  • round-trip drift;
  • topology overclaim;
  • possible-world substitution;
  • LLM judgments.