For AI Companies

Infrastructure for responsible AI training. Check content preferences before you train.

The Compliance Challenge

As regulations like the EU AI Act and TDM Directive take effect, AI companies need reliable ways to verify content permissions before training. Scattered robots.txt files and inconsistent meta tags make compliance difficult and legally risky.

The Training Data Registry provides a single, searchable source for content creator preferences, with verified domain ownership and dates recorded to public GitHub commits for independent verification.

Get Early API Access

Be among the first to integrate compliance checking into your AI training pipeline. Early adopters will help shape the API design.

Prefer to talk first? contact@trainingdataregistry.org

We'll be in touch when the API is available.

API Plans

Standard

  • 1,000 queries/month
  • Single URL & hash check
  • For testing and evaluation

Pro

  • 10,000 queries/month
  • Everything in Standard
  • Bulk check endpoint

Enterprise

  • 100,000 queries/month
  • Everything in Pro
  • Full database access
  • Priority support
  • Bespoke agreements available

API Access

Check Single URL

GET /api/v1/check?url=example.com/article

Returns registration status, AI-use permissions, and verification level for any URL.

Bulk Check

POST /api/v1/check/bulk

Check multiple URLs in a single request. Available on Pro and Enterprise plans.

Domain Status

GET /api/v1/check/domain?domain=example.com

Check if an entire domain has registered preferences (full domain coverage).

Response Format

{
  "registered": true,
  "source": "url",
  "trust": "verified",
  "url": "https://example.com/article",
  "registration_id": "vBETdLGWCvY0",
  "allow_training": false,
  "allow_inference": true,
  "allow_archive": false,
  "verification_status": "domain-verified",
  "registered_at": "2026-02-01T10:00:00Z",
  "checked_at": "2026-02-20T10:00:00Z"
}

Granular permissions: Content creators can independently control training, inference, and archive permissions.

source"url" for a specific registered URL; "domain" when the entire domain is covered (returns domain field instead of url and registration_id).

trust"verified" when domain ownership is proven; "unverified" for user-claimed registrations. Use verified records for compliance decisions.

registration_id — the Registry ID, linking to the public certificate at /verify/{id}. Present for URL-level matches. For domain-level responses (source: "domain"), the domain field is the unique identifier — all URLs on that domain share the same verified domain record.

What the Registry Tracks

T

Training

Pretraining, fine-tuning, RLHF, synthetic data

I

Inference

Summarisation, search, translation, Q&A

A

Archive

Datasets, vector DBs, cached corpora

Three independent permissions per URL or domain. Default: all prohibited.

Trust Levels

UnverifiedUser-claimed
VerifiedDomain ownership proven

Verified registrations are recommended for compliance decisions.