Evidence & Validation Gates | CTE Research Initiative

The Commitment

CTE uses pre-defined validation gates at Day 30, 60, and 90 to determine whether the Decision Load Index produces meaningful results. If any gate fails, the experiment halts and findings are published. Gates measure signal detection, behavioral change, and economic viability.

If this succeeds, we'll scale it.

If it fails, we'll say so.

These are the gates that decide.

Validation Gates

CTE validation gates are pre-defined falsifiability criteria: Day 30 tests whether DLI scores show meaningful variance and correlate with self-reported overwhelm. Day 60 tests behavioral change and test-retest reliability. Day 90 tests economic viability and willingness to pay. Failure halts the experiment.

We've defined specific, measurable criteria at three checkpoints. Failure on any criterion triggers the halt path. No exceptions, no reinterpretation.

Day 30

Signal Detection

Day 30 Passed

Can we detect meaningful signal in the data? Does DLI correlate with anything real?

DLI Variance Standard deviation > 10 points across cohort (proves the metric differentiates)
Completion Rate > 40% of participants complete weekly check-ins
Self-Correlation DLI correlates with self-reported overwhelm > 0.5 r

If failed: DLI doesn't measure anything real. Publish findings. Halt experiment.

Day 60

Behavioral Change

Evaluating

Does awareness plus tooling change actual behavior? Can participants observe differences?

Retention > 50% of Day-30 participants still active
Observable Change At least 3 participants report measurable work changes
Test-Retest Reliability DLI stability > 0.7 for same conditions

If failed: Awareness doesn't drive change. The tool doesn't work. Publish findings. Halt experiment.

Day 90

Economic Viability

Day 90 — Apr 3

Is there external willingness to pay for this signal? Can this become sustainable?

Retention > 50% of original cohort still active at Day 90
Willingness to Pay At least 20% indicate they'd pay to continue
Sponsor Interest At least 3 sponsor conversations initiated

If failed: No viable business model exists. Publish aggregate findings. Halt experiment.

What Success Looks Like

If we pass all gates:

DLI is a validated signal that predicts cognitive load
Participants report measurable changes in how they work
There's demonstrated willingness to pay for the tool
We have evidence to support scaling responsibly
Published findings contribute to productivity research

What Failure Looks Like

If we fail any gate:

We publish exactly what we learned (including why it failed)
We return any unused funds to participants
We halt the experiment publicly and transparently
We do NOT pivot to a different model or reframe the failure
The data becomes public research for others to build on

Why We're Publishing This

Most productivity tools launch with bold claims and vague success metrics. If they don't work, they quietly pivot or shut down. Nobody learns anything.

We think that's backwards.

By publishing our validation gates in advance, we're committing to a specific, falsifiable hypothesis. If we're wrong, the world learns something. If we're right, the evidence is credible because it was defined before we knew the outcome.

This is what research-first actually means.

Published Framework

The Decision Load Index methodology is publicly documented and citable:

Cognitive Thought Engine. (2026). Decision Load Index: A conceptual framework for measuring cognitive burden in knowledge work. Zenodo. https://doi.org/10.5281/zenodo.18217577

Cognitive Thought Engine. (2026). Constitutional Self-Governance for Autonomous AI Systems. Zenodo. https://doi.org/10.5281/zenodo.19162104

Saleme, M.K. (2026). Detecting Normalization of Deviance in Multi-Agent Systems: Empirical Evidence for Graph-Based Behavioral Drift Detection. Zenodo. https://doi.org/10.5281/zenodo.19195516

Saleme, M.K. (2026). Beyond Identity Governance: A Protocol-Level Security Testing Framework for Multi-Agent AI Systems. Zenodo. https://doi.org/10.5281/zenodo.19343034

Saleme, M.K. (2026). Community-Driven Security for AI Agents: Evolution of an Adversarial Testing Framework. Zenodo. https://doi.org/10.5281/zenodo.19343108

These preprints establish our theoretical foundations, component definitions, and validation approach. We publish methodology before validation so our framework can be scrutinized independently.