Part of CNS 8.0 / Grounded Dialectical Orthesis

12 — Experiment and Evaluation Plan

2 min read •Updated May 15, 2026

12 — Experiment and Evaluation Plan

Experiment 1 — Synthetic latent-context recovery

Goal

Test whether contradiction-driven predicate invention recovers hidden context variables.

Dataset

Generate examples with claims that appear contradictory until a hidden variable is introduced:

time;
subgroup;
measurement method;
jurisdiction;
dosage;
source condition;
definition boundary.

Baselines

RAG summary;
LLM debate;
claim-level fact verification;
possible-world ranking without predicate invention;
CNS without chirality/entanglement pair selection.

Metrics

latent predicate recovery F1;
residual energy reduction;
PIU;
orthesis acceptance rate;
false predicate rate.

Experiment 2 — Productive conflict selection

Goal

Test whether Productive Conflict Score selects better synthesis pairs than baselines.

Data

Construct SNO pairs with known categories:

agreement over shared evidence;
disagreement over shared evidence;
disagreement over unrelated evidence;
unrelated topics;
extraction-error conflicts.

Metrics

pair-selection precision@k;
synthesis yield;
critic failure rate;
human or oracle-rated productive conflict label.

Experiment 3 — Grounded synthesis on SciFact/FEVER

Goal

Evaluate evidence-grounded extraction and synthesis under known labels.

Tasks

extract SNOs from evidence;
verify citations and entailment;
identify support/refute contradictions;
generate constrained synthesis when applicable.

Metrics

citation validity;
rationale recovery;
entailment score;
label accuracy as diagnostic;
strict-claim ZTHR;
proof trace completeness.

Experiment 4 — Orthesis round-trip stability

Goal

Measure whether synthesized SNOs survive render/re-ground cycles.

Protocol

For each synthesized SNO:

render to natural language;
re-extract SNO;
align proof-critical atoms;
compute $\chi_{LL}$;
repeat for $n$ cycles.

Metrics

round-trip residual;
proof atom preservation;
claim drift;
evidence drift;
orthesis convergence rate.

Experiment 5 — Topology and synthesis difficulty

Goal

Test whether topology metrics predict synthesis difficulty.

Metrics

$\beta_1$ before/after synthesis;
persistence features;
chiral tensor norm;
residual energy;
human-rated difficulty;
number of synthesis iterations.

Hypothesis: chirality + entanglement + residual topology predicts difficulty better than embedding distance.

Experiment 6 — Oracle-boundary audit

Goal

Ensure runtime does not use training labels or hidden gold states.

Checks

prompt label leakage;
dataset split contamination;
synthetic generator parameter leakage;
LLM judge truth-vote leakage;
calibration/training metadata separation.

Experiment 7 — Ablation suite

Ablate:

Antagonist;
Evidential Entanglement;
graph chirality;
language–logic round-trip;
tensor proof closure;
predicate invention;
access states;
possible-world posterior;
orthesis loop.

Statistical reporting

Report:

bootstrap confidence intervals;
effect sizes;
calibration curves;
per-domain breakdown;
failure taxonomy;
examples with proof traces.

Minimum publishable result

A strong first paper needs:

synthetic latent context recovery;
proof-trace examples;
pair selection outperforming baselines;
strict zero-temperature hallucination rate of zero on constrained subset;
orthesis round-trip residual reduction;
ablation showing predicate invention and entanglement matter.

Step 13 of 39 in CNS 8.0 / Grounded Dialectical Orthesis

← 11 — Implementation Plan Next: 13 — Metrics and Acceptance Criteria →

12 — Experiment and Evaluation Plan

Experiment 1 — Synthetic latent-context recovery

Goal

Dataset

Baselines

Metrics

Experiment 2 — Productive conflict selection

Goal

Data

Metrics

Experiment 3 — Grounded synthesis on SciFact/FEVER

Goal

Tasks

Metrics

Experiment 4 — Orthesis round-trip stability

Goal

Protocol

Metrics

Experiment 5 — Topology and synthesis difficulty

Goal

Metrics

Experiment 6 — Oracle-boundary audit

Goal

Checks

Experiment 7 — Ablation suite

Statistical reporting

Minimum publishable result

Explore More CNS Resources

CNS 7.1 / GCTS

Developer's Guide