Evidence & Validation
What we measure, what would disprove us, and what happens if we're wrong.
The Commitment
CTE uses pre-defined validation gates at Day 30, 60, and 90 to determine whether the Decision Load Index produces meaningful results. If any gate fails, the experiment halts and findings are published. Gates measure signal detection, behavioral change, and economic viability.
If this succeeds, we'll scale it.
If it fails, we'll say so.
These are the gates that decide.
Validation Gates
CTE validation gates are pre-defined falsifiability criteria: Day 30 tests whether DLI scores show meaningful variance and correlate with self-reported overwhelm. Day 60 tests behavioral change and test-retest reliability. Day 90 tests economic viability and willingness to pay. Failure halts the experiment.
We've defined specific, measurable criteria at three checkpoints. Failure on any criterion triggers the halt path. No exceptions, no reinterpretation.
Can we detect meaningful signal in the data? Does DLI correlate with anything real?
- DLI Variance Standard deviation > 10 points across cohort (proves the metric differentiates)
- Completion Rate > 40% of participants complete weekly check-ins
- Self-Correlation DLI correlates with self-reported overwhelm > 0.5 r
If failed: DLI doesn't measure anything real. Publish findings. Halt experiment.
Does awareness plus tooling change actual behavior? Can participants observe differences?
- Retention > 50% of Day-30 participants still active
- Observable Change At least 3 participants report measurable work changes
- Test-Retest Reliability DLI stability > 0.7 for same conditions
If failed: Awareness doesn't drive change. The tool doesn't work. Publish findings. Halt experiment.
Is there external willingness to pay for this signal? Can this become sustainable?
- Retention > 50% of original cohort still active at Day 90
- Willingness to Pay At least 20% indicate they'd pay to continue
- Sponsor Interest At least 3 sponsor conversations initiated
If failed: No viable business model exists. Publish aggregate findings. Halt experiment.
What Success Looks Like
If we pass all gates:
- DLI is a validated signal that predicts cognitive load
- Participants report measurable changes in how they work
- There's demonstrated willingness to pay for the tool
- We have evidence to support scaling responsibly
- Published findings contribute to productivity research
What Failure Looks Like
If we fail any gate:
- We publish exactly what we learned (including why it failed)
- We return any unused funds to participants
- We halt the experiment publicly and transparently
- We do NOT pivot to a different model or reframe the failure
- The data becomes public research for others to build on
Why We're Publishing This
Most productivity tools launch with bold claims and vague success metrics. If they don't work, they quietly pivot or shut down. Nobody learns anything.
We think that's backwards.
By publishing our validation gates in advance, we're committing to a specific, falsifiable hypothesis. If we're wrong, the world learns something. If we're right, the evidence is credible because it was defined before we knew the outcome.
This is what research-first actually means.
Published Framework
The Decision Load Index methodology is publicly documented and citable:
Cognitive Thought Engine. (2026). Decision Load Index: A conceptual framework for measuring cognitive burden in knowledge work. Zenodo. https://doi.org/10.5281/zenodo.18217577
Cognitive Thought Engine. (2026). Constitutional Self-Governance for Autonomous AI Systems. Zenodo. https://doi.org/10.5281/zenodo.19162104
Saleme, M.K. (2026). Detecting Normalization of Deviance in Multi-Agent Systems: Empirical Evidence for Graph-Based Behavioral Drift Detection. Zenodo. https://doi.org/10.5281/zenodo.19195516
Saleme, M.K. (2026). Beyond Identity Governance: A Protocol-Level Security Testing Framework for Multi-Agent AI Systems. Zenodo. https://doi.org/10.5281/zenodo.19343034
Saleme, M.K. (2026). Community-Driven Security for AI Agents: Evolution of an Adversarial Testing Framework. Zenodo. https://doi.org/10.5281/zenodo.19343108
These preprints establish our theoretical foundations, component definitions, and validation approach. We publish methodology before validation so our framework can be scrutinized independently.
See where your decision load stands. 5 minutes, free, immediate results.
Take the Assessment Learn the Method