Skip to content
Falsifiability Anchor

Evidence & Validation

What we measure, what would disprove us, and what happens if we're wrong.

Explore

The Commitment

CTE uses pre-defined validation gates at Day 30, 60, and 90 to determine whether the Decision Load Index produces meaningful results. If any gate fails, the experiment halts and findings are published. Gates measure signal detection, behavioral change, and economic viability.

If this succeeds, we'll scale it.

If it fails, we'll say so.

These are the gates that decide.

Validation Gates

CTE validation gates are pre-defined falsifiability criteria: Day 30 tests whether DLI scores show meaningful variance and correlate with self-reported overwhelm. Day 60 tests behavioral change and test-retest reliability. Day 90 tests economic viability and willingness to pay. Failure halts the experiment.

We've defined specific, measurable criteria at three checkpoints. Failure on any criterion triggers the halt path. No exceptions, no reinterpretation.

Day 30
Signal Detection
Day 30 Passed

Can we detect meaningful signal in the data? Does DLI correlate with anything real?

  • DLI Variance Standard deviation > 10 points across cohort (proves the metric differentiates)
  • Completion Rate > 40% of participants complete weekly check-ins
  • Self-Correlation DLI correlates with self-reported overwhelm > 0.5 r

If failed: DLI doesn't measure anything real. Publish findings. Halt experiment.

Day 60
Behavioral Change
Evaluating

Does awareness plus tooling change actual behavior? Can participants observe differences?

  • Retention > 50% of Day-30 participants still active
  • Observable Change At least 3 participants report measurable work changes
  • Test-Retest Reliability DLI stability > 0.7 for same conditions

If failed: Awareness doesn't drive change. The tool doesn't work. Publish findings. Halt experiment.

Day 90
Economic Viability
Day 90 — Apr 3

Is there external willingness to pay for this signal? Can this become sustainable?

  • Retention > 50% of original cohort still active at Day 90
  • Willingness to Pay At least 20% indicate they'd pay to continue
  • Sponsor Interest At least 3 sponsor conversations initiated

If failed: No viable business model exists. Publish aggregate findings. Halt experiment.

What Success Looks Like

If we pass all gates:

  • DLI is a validated signal that predicts cognitive load
  • Participants report measurable changes in how they work
  • There's demonstrated willingness to pay for the tool
  • We have evidence to support scaling responsibly
  • Published findings contribute to productivity research

What Failure Looks Like

If we fail any gate:

  • We publish exactly what we learned (including why it failed)
  • We return any unused funds to participants
  • We halt the experiment publicly and transparently
  • We do NOT pivot to a different model or reframe the failure
  • The data becomes public research for others to build on

Why We're Publishing This

Most productivity tools launch with bold claims and vague success metrics. If they don't work, they quietly pivot or shut down. Nobody learns anything.

We think that's backwards.

By publishing our validation gates in advance, we're committing to a specific, falsifiable hypothesis. If we're wrong, the world learns something. If we're right, the evidence is credible because it was defined before we knew the outcome.

This is what research-first actually means.

Published Framework

The Decision Load Index methodology is publicly documented and citable:

Cognitive Thought Engine. (2026). Decision Load Index: A conceptual framework for measuring cognitive burden in knowledge work. Zenodo. https://doi.org/10.5281/zenodo.18217577

Cognitive Thought Engine. (2026). Constitutional Self-Governance for Autonomous AI Systems. Zenodo. https://doi.org/10.5281/zenodo.19162104

Saleme, M.K. (2026). Detecting Normalization of Deviance in Multi-Agent Systems: Empirical Evidence for Graph-Based Behavioral Drift Detection. Zenodo. https://doi.org/10.5281/zenodo.19195516

Saleme, M.K. (2026). Beyond Identity Governance: A Protocol-Level Security Testing Framework for Multi-Agent AI Systems. Zenodo. https://doi.org/10.5281/zenodo.19343034

Saleme, M.K. (2026). Community-Driven Security for AI Agents: Evolution of an Adversarial Testing Framework. Zenodo. https://doi.org/10.5281/zenodo.19343108

These preprints establish our theoretical foundations, component definitions, and validation approach. We publish methodology before validation so our framework can be scrutinized independently.

See where your decision load stands. 5 minutes, free, immediate results.

Take the Assessment Learn the Method