API Documentation

Technical reference for the Training Data Registry API. API access requires a key — apply on the for AI companies page.

Authentication

All API requests must include your API key in the X-API-Key request header. API keys are issued through your dashboard.

GET /api/v1/check?url=https://example.com/article
X-API-Key: tdr_live_your_api_key_here

API keys follow the format tdr_live_ followed by 32 hexadecimal characters. Keys are shown only once at creation — store yours securely. Do not embed keys in client-side code or public repositories.

Single check

GET /api/v1/check

Check whether a URL or content hash is registered in the Training Data Registry. Supply either url or hash — not both.

Query parameters

Parameter	Type	Description
url	string	The URL to check. Must be a valid HTTP or HTTPS URL. Required if `hash` is not provided.
hash	string	SHA-256 content hash to check (64 lowercase hex characters). Required if `url` is not provided.

Example requests

# Check by URL
GET /api/v1/check?url=https://example.com/my-article
X-API-Key: tdr_live_...

# Check by content hash
GET /api/v1/check?hash=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-API-Key: tdr_live_...

URL normalisation

URLs are normalised before lookup: schemes are lowercased, trailing slashes removed, and fragments stripped. URLs without a scheme are assumed to be HTTPS. You do not need to pre-normalise URLs before sending them.

Bulk check

POST /api/v1/check/bulk

Check multiple URLs in a single request. Available on paid tiers only. Each URL in the batch counts as one query against your monthly allowance.

Tier limits

Tier	Bulk access	Max URLs per request
Free	Not available	—
Pro	Available	100
Enterprise	Available	1,000

Request body

POST /api/v1/check/bulk
X-API-Key: tdr_live_...
Content-Type: application/json

{
  "urls": [
    "https://example.com/article-1",
    "https://example.com/article-2",
    "https://blog.example.com/post-3"
  ]
}

Response

{
  "success": true,
  "stats": {
    "total": 3,
    "checked": 3,
    "registered": 1,
    "not_registered": 2,
    "errors": 0,
    "processing_time_ms": 142
  },
  "results": [
    {
      "url": "https://example.com/article-1",
      "registered": true,
      "source": "domain",
      "trust": "verified",
      "domain": "example.com",
      "allow_training": false,
      "allow_inference": false,
      "allow_archive": false,
      "verification_status": "domain-verified",
      "registered_at": "2026-01-15T10:00:00.000Z"
    },
    {
      "url": "https://example.com/article-2",
      "registered": false
    },
    {
      "url": "https://blog.example.com/post-3",
      "registered": false
    }
  ],
  "rate_limit": {
    "limit": 10000,
    "used": 3,
    "remaining": 9997
  },
  "checked_at": "2026-02-26T12:00:00.000Z"
}

If any URLs fail validation, they appear in an errors array alongside the results array. Results maintain the same order as the input array.

Response format

Not registered

{
  "registered": false,
  "checked_at": "2026-02-26T12:00:00.000Z"
}

Registered — URL-level

{
  "registered": true,
  "source": "url",
  "trust": "verified",
  "url": "https://example.com/specific-article",
  "registration_id": "reg_abc123",
  "allow_training": false,
  "allow_inference": true,
  "allow_archive": false,
  "verification_status": "domain-verified",
  "registered_at": "2026-01-15T10:00:00.000Z",
  "checked_at": "2026-02-26T12:00:00.000Z"
}

Registered — domain-level

{
  "registered": true,
  "source": "domain",
  "trust": "verified",
  "domain": "example.com",
  "allow_training": false,
  "allow_inference": false,
  "allow_archive": false,
  "verification_status": "domain-verified",
  "registered_at": "2026-01-10T09:00:00.000Z",
  "checked_at": "2026-02-26T12:00:00.000Z"
}

Response fields

Field	Type	Description
registered	boolean	Whether the content is registered in the registry.
source	"url" \| "domain"	Whether the match came from a URL-level registration or a domain-wide registration.
trust	"verified" \| "unverified"	verified — the registrant has proven domain ownership. unverified — a self-declared registration with no domain proof. Weight your compliance decisions accordingly.
allow_training	boolean	Whether use for AI model training is permitted (pre-training, fine-tuning, RLHF, distillation, etc.).
allow_inference	boolean	Whether ephemeral processing for inference outputs is permitted (summarisation, translation, Q&A, etc.).
allow_archive	boolean	Whether long-term storage or indexing is permitted (dataset storage, vector databases, cached corpora, etc.).
verification_status	string	"unverified" or "domain-verified". Reflects the verification tier of the registration.
registered_at	ISO 8601	When the registration was created. For domain-verified registrations, this is the domain verification timestamp.
checked_at	ISO 8601	When this API response was generated. Log this alongside cached results as part of your compliance records.

Rate limits

Monthly query limits apply based on your subscription tier. Limits reset at the start of each calendar month (UTC). Current tier allowances are shown on our pricing page.

Rate limit headers

Every response includes the following headers:

Header	Description
X-RateLimit-Limit	Your monthly query allowance.
X-RateLimit-Remaining	Queries remaining this month.
X-RateLimit-Used	Queries used this month.

When a request is rejected due to quota exhaustion, a 429 response is returned with an additional X-RateLimit-Reset header containing the UTC timestamp when your allowance resets.

Error codes

Error responses follow a consistent format:

{
  "error": "Human-readable error message",
  "code": "MACHINE_READABLE_CODE"
}

HTTP status	Code	Meaning
400	MISSING_PARAMETER	Neither `url` nor `hash` was provided.
400	INVALID_URL	The URL could not be parsed as a valid HTTP/HTTPS URL.
400	INVALID_HASH	Hash is not a valid SHA-256 value (must be 64 lowercase hex characters).
400	MISSING_URLS	Bulk request body missing or `urls` is not an array.
400	EMPTY_URLS	Bulk request `urls` array is empty.
400	BATCH_TOO_LARGE	Bulk request exceeds the URL limit for your tier.
400	INVALID_JSON	Bulk request body is not valid JSON.
401	INVALID_API_KEY	API key is missing or not recognised.
403	KEY_SUSPENDED	API key has been suspended. Contact contact@trainingdataregistry.org.
403	TIER_NOT_ALLOWED	Bulk endpoint is not available on your current tier.
429	RATE_LIMIT_EXCEEDED	Monthly query allowance exhausted. Resets on the first of next month (UTC).
500	INTERNAL_ERROR	Unexpected server error. If this persists, contact contact@trainingdataregistry.org.

Code examples

curl

# Single URL check
curl -s -H "X-API-Key: tdr_live_your_key" \
  "https://trainingdataregistry.org/api/v1/check?url=https://example.com/article"

# Bulk check
curl -s -X POST \
  -H "X-API-Key: tdr_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{"urls":["https://example.com/article-1","https://example.com/article-2"]}' \
  "https://trainingdataregistry.org/api/v1/check/bulk"

JavaScript

const API_KEY = process.env.TDR_API_KEY;
const BASE_URL = 'https://trainingdataregistry.org/api/v1';

// Single check
async function checkUrl(url) {
  const response = await fetch(
    `${BASE_URL}/check?url=${encodeURIComponent(url)}`,
    { headers: { 'X-API-Key': API_KEY } }
  );
  return response.json();
}

// Bulk check
async function checkUrls(urls) {
  const response = await fetch(`${BASE_URL}/check/bulk`, {
    method: 'POST',
    headers: {
      'X-API-Key': API_KEY,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ urls }),
  });
  return response.json();
}

// Example usage
const result = await checkUrl('https://example.com/my-article');
if (result.registered && !result.allow_training) {
  console.log('Content is opted out of AI training');
}

Python

import os
import requests
from urllib.parse import urlencode

API_KEY = os.environ['TDR_API_KEY']
BASE_URL = 'https://trainingdataregistry.org/api/v1'
HEADERS = {'X-API-Key': API_KEY}

def check_url(url: str) -> dict:
    params = urlencode({'url': url})
    response = requests.get(f'{BASE_URL}/check?{params}', headers=HEADERS)
    response.raise_for_status()
    return response.json()

def check_urls(urls: list[str]) -> dict:
    response = requests.post(
        f'{BASE_URL}/check/bulk',
        headers={**HEADERS, 'Content-Type': 'application/json'},
        json={'urls': urls},
    )
    response.raise_for_status()
    return response.json()

# Example usage
result = check_url('https://example.com/my-article')
if result['registered'] and not result.get('allow_training', False):
    print('Content is opted out of AI training')

Best practices

Cache responses

Cache API responses for up to the period permitted by your tier (see API Terms). Log the checked_at timestamp alongside your cached results — this timestamp is your evidence of when you queried the registry relative to any content use.

Use bulk for datasets

When processing large datasets, use the bulk endpoint rather than looping over single checks. This is faster and uses the same number of queries against your monthly allowance.

Check before each training run

Preferences change over time — creators may register or withdraw URLs at any point. We recommend re-checking content against the registry before each training run or data ingestion cycle, rather than relying on a one-time historical check.

Respect both trust levels

Responses include a trust field: verified (domain ownership confirmed) or unverified (self-declared). While verified registrations carry higher evidential weight, we recommend giving appropriate weight to both — an unverified registration is still a documented expression of preference.

Corporate networks and VPNs

We monitor API keys for unusual usage patterns. If your team accesses the API from multiple office locations, a corporate VPN, or distributed infrastructure, requests may appear to originate from diverse network ranges. If your key is suspended and you believe this is the reason, contact contact@trainingdataregistry.org and we will review promptly.

Keep keys secure

Store your API key as an environment variable, never in source code or client-side applications. Do not share keys across teams or systems — create separate keys for each use case so you can revoke them independently if needed.

Questions?

For integration support or questions not covered here, contact contact@trainingdataregistry.org. API access is currently by application — apply here.