For AI Companies

Infrastructure for responsible AI training. Check content preferences before you train.

The Compliance Challenge

As regulations like the EU AI Act and TDM Directive take effect, AI companies need reliable ways to verify content permissions before training. Scattered robots.txt files and inconsistent meta tags make compliance difficult and legally risky.

The Training Data Registry provides a single, searchable source for content creator preferences, with verified domain ownership and dates recorded to public GitHub commits for independent verification.

Get Early API Access

Be among the first to integrate compliance checking into your AI training pipeline. Early adopters will help shape the API design.

Prefer to talk first? contact@trainingdataregistry.org

API Plans

Standard

1,000 queries/month
Single URL & hash check
For testing and evaluation

Pro

10,000 queries/month
Everything in Standard
Bulk check endpoint

Enterprise

100,000 queries/month
Everything in Pro
Full database access
Priority support
Bespoke agreements available

API Access

Check Single URL

GET /api/v1/check?url=example.com/article

Returns registration status, AI-use permissions, and verification level for any URL.

Bulk Check

POST /api/v1/check/bulk

Check multiple URLs in a single request. Available on Pro and Enterprise plans.

Domain Status

GET /api/v1/check/domain?domain=example.com

Check if an entire domain has registered preferences (full domain coverage).

Response Format

{
  "registered": true,
  "source": "url",
  "trust": "verified",
  "url": "https://example.com/article",
  "registration_id": "vBETdLGWCvY0",
  "allow_training": false,
  "allow_inference": true,
  "allow_archive": false,
  "verification_status": "domain-verified",
  "registered_at": "2026-02-01T10:00:00Z",
  "checked_at": "2026-02-20T10:00:00Z"
}

Granular permissions: Content creators can independently control training, inference, and archive permissions.

source — "url" for a specific registered URL; "domain" when the entire domain is covered (returns domain field instead of url and registration_id).

trust — "verified" when domain ownership is proven; "unverified" for user-claimed registrations. Use verified records for compliance decisions.

registration_id — the Registry ID, linking to the public certificate at /verify/{id}. Present for URL-level matches. For domain-level responses (source: "domain"), the domain field is the unique identifier — all URLs on that domain share the same verified domain record.

What the Registry Tracks

Training

Pretraining, fine-tuning, RLHF, synthetic data

Inference

Summarisation, search, translation, Q&A

Trust Levels

UnverifiedUser-claimed

VerifiedDomain ownership proven

Verified registrations are recommended for compliance decisions.