For AI Companies
Infrastructure for responsible AI training. Check content preferences before you train.
The Compliance Challenge
As regulations like the EU AI Act and TDM Directive take effect, AI companies need reliable ways to verify content permissions before training. Scattered robots.txt files and inconsistent meta tags make compliance difficult and legally risky.
The Training Data Registry provides a single, searchable source for content creator preferences, with verified domain ownership and dates recorded to public GitHub commits for independent verification.
Get Early API Access
Be among the first to integrate compliance checking into your AI training pipeline. Early adopters will help shape the API design.
Prefer to talk first? contact@trainingdataregistry.org
API Plans
Standard
- 1,000 queries/month
- Single URL & hash check
- For testing and evaluation
Pro
- 10,000 queries/month
- Everything in Standard
- Bulk check endpoint
Enterprise
- 100,000 queries/month
- Everything in Pro
- Full database access
- Priority support
- Bespoke agreements available
API Access
Check Single URL
GET /api/v1/check?url=example.com/articleReturns registration status, AI-use permissions, and verification level for any URL.
Bulk Check
POST /api/v1/check/bulkCheck multiple URLs in a single request. Available on Pro and Enterprise plans.
Domain Status
GET /api/v1/check/domain?domain=example.comCheck if an entire domain has registered preferences (full domain coverage).
Response Format
{
"registered": true,
"source": "url",
"trust": "verified",
"url": "https://example.com/article",
"registration_id": "vBETdLGWCvY0",
"allow_training": false,
"allow_inference": true,
"allow_archive": false,
"verification_status": "domain-verified",
"registered_at": "2026-02-01T10:00:00Z",
"checked_at": "2026-02-20T10:00:00Z"
}Granular permissions: Content creators can independently control training, inference, and archive permissions.
source — "url" for a specific registered URL; "domain" when the entire domain is covered (returns domain field instead of url and registration_id).
trust — "verified" when domain ownership is proven; "unverified" for user-claimed registrations. Use verified records for compliance decisions.
registration_id — the Registry ID, linking to the public certificate at /verify/{id}. Present for URL-level matches. For domain-level responses (source: "domain"), the domain field is the unique identifier — all URLs on that domain share the same verified domain record.
What the Registry Tracks
Training
Pretraining, fine-tuning, RLHF, synthetic data
Inference
Summarisation, search, translation, Q&A
Archive
Datasets, vector DBs, cached corpora
Three independent permissions per URL or domain. Default: all prohibited.
Trust Levels
Verified registrations are recommended for compliance decisions.
