Training Data Registry

The Training Data Registry provides creators with verifiable, dated evidence of their data usage preferences — and gives AI companies a single source to query.

The Problem

Current opt-out mechanisms are fragmented. Meta tags, robots.txt directives, and platform-specific settings exist, but none provide timestamped proof of when a preference was expressed. When disputes arise, this missing timestamp is often the crux of the matter.

Meanwhile, courts and lawmakers are hesitant to set precedents that might unfairly advantage either creators or AI developers. Both sides are waiting for clarity that isn't coming.

"Timestamps are the missing piece."

Our Approach

TDR doesn't take sides. We build infrastructure.

By creating a transparent, queryable registry with immutable timestamps, we provide:

For creators

Dated evidence that preferences were expressed before potential misuse occurred.

For AI companies

A single, reliable source to check opt-out status, reducing legal uncertainty and due diligence overhead.

For regulators

Working infrastructure that can support future legislation, audits, and compliance frameworks — without requiring them to build it from scratch.

Why Timestamps Matter

Recent court orders have required AI companies to retain conversation logs and training records as potential evidence. This underscores a growing legal reality: time-based records matter.

Existing opt-out signals (noai tags, TDM reservations) express preference but don't prove when that preference was made. TDR fills this gap. Every registration is timestamped and preserved, creating a verifiable record that can be referenced if needed.

"Establish your rights before they're needed."

Why Now

The gap between what AI companies spend on content and what they spend on everything else is stark.

The entire AI training data licensing market is currently valued at around $3–4 billion annually. Meanwhile, leading AI companies have committed over $100 billion to chip procurement alone, and company valuations run into hundreds of billions.

This disproportion isn't sustainable. As legal and reputational pressures mount, licensing infrastructure will become essential — not optional. Recent settlements have already reached $1.5 billion for a single case, signalling that the cost of not having clear consent records is rising fast.

Infrastructure built today shapes how that licensing ecosystem develops. TDR provides a foundation that benefits everyone: creators gain protection, AI companies gain certainty, and lawmakers gain a working model to build upon.

Who We Are

TDR is an independent project based in the UK. Founded by Daniel Gage.

We believe infrastructure should exist before it's mandated, not after. Our goal is to provide practical tools that work today while contributing to a fairer ecosystem tomorrow.

Questions or feedback: contact@trainingdataregistry.org

Our Principles

Transparency

The registry is publicly browsable. Opt-out status can be verified by anyone.

Neutrality

We serve creators and AI companies equally. A working system benefits both.

Durability