API Documentation

Technical reference for the Training Data Registry API. API access requires a key — apply on the for AI companies page.

Authentication

All API requests must include your API key in the X-API-Key request header. API keys are issued through your dashboard.

GET /api/v1/check?url=https://example.com/article
X-API-Key: tdr_live_your_api_key_here

API keys follow the format tdr_live_ followed by 32 hexadecimal characters. Keys are shown only once at creation — store yours securely. Do not embed keys in client-side code or public repositories.

Single check

GET /api/v1/check

Check whether a URL or content hash is registered in the Training Data Registry. Supply either url or hash — not both.

Query parameters

ParameterTypeDescription
urlstringThe URL to check. Must be a valid HTTP or HTTPS URL. Required if hash is not provided.
hashstringSHA-256 content hash to check (64 lowercase hex characters). Required if url is not provided.

Example requests

# Check by URL
GET /api/v1/check?url=https://example.com/my-article
X-API-Key: tdr_live_...

# Check by content hash
GET /api/v1/check?hash=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-API-Key: tdr_live_...

URL normalisation

URLs are normalised before lookup: schemes are lowercased, trailing slashes removed, and fragments stripped. URLs without a scheme are assumed to be HTTPS. You do not need to pre-normalise URLs before sending them.

Bulk check

POST /api/v1/check/bulk

Check multiple URLs in a single request. Available on paid tiers only. Each URL in the batch counts as one query against your monthly allowance.

Tier limits

TierBulk accessMax URLs per request
FreeNot available
ProAvailable100
EnterpriseAvailable1,000

Request body

POST /api/v1/check/bulk
X-API-Key: tdr_live_...
Content-Type: application/json

{
  "urls": [
    "https://example.com/article-1",
    "https://example.com/article-2",
    "https://blog.example.com/post-3"
  ]
}

Response

{
  "success": true,
  "stats": {
    "total": 3,
    "checked": 3,
    "registered": 1,
    "not_registered": 2,
    "errors": 0,
    "processing_time_ms": 142
  },
  "results": [
    {
      "url": "https://example.com/article-1",
      "registered": true,
      "source": "domain",
      "trust": "verified",
      "domain": "example.com",
      "allow_training": false,
      "allow_inference": false,
      "allow_archive": false,
      "verification_status": "domain-verified",
      "registered_at": "2026-01-15T10:00:00.000Z"
    },
    {
      "url": "https://example.com/article-2",
      "registered": false
    },
    {
      "url": "https://blog.example.com/post-3",
      "registered": false
    }
  ],
  "rate_limit": {
    "limit": 10000,
    "used": 3,
    "remaining": 9997
  },
  "checked_at": "2026-02-26T12:00:00.000Z"
}

If any URLs fail validation, they appear in an errors array alongside the results array. Results maintain the same order as the input array.

Response format

Not registered

{
  "registered": false,
  "checked_at": "2026-02-26T12:00:00.000Z"
}

Registered — URL-level

{
  "registered": true,
  "source": "url",
  "trust": "verified",
  "url": "https://example.com/specific-article",
  "registration_id": "reg_abc123",
  "allow_training": false,
  "allow_inference": true,
  "allow_archive": false,
  "verification_status": "domain-verified",
  "registered_at": "2026-01-15T10:00:00.000Z",
  "checked_at": "2026-02-26T12:00:00.000Z"
}

Registered — domain-level

{
  "registered": true,
  "source": "domain",
  "trust": "verified",
  "domain": "example.com",
  "allow_training": false,
  "allow_inference": false,
  "allow_archive": false,
  "verification_status": "domain-verified",
  "registered_at": "2026-01-10T09:00:00.000Z",
  "checked_at": "2026-02-26T12:00:00.000Z"
}

Response fields

FieldTypeDescription
registeredbooleanWhether the content is registered in the registry.
source"url" | "domain"Whether the match came from a URL-level registration or a domain-wide registration.
trust"verified" | "unverified"verified — the registrant has proven domain ownership. unverified — a self-declared registration with no domain proof. Weight your compliance decisions accordingly.
allow_trainingbooleanWhether use for AI model training is permitted (pre-training, fine-tuning, RLHF, distillation, etc.).
allow_inferencebooleanWhether ephemeral processing for inference outputs is permitted (summarisation, translation, Q&A, etc.).
allow_archivebooleanWhether long-term storage or indexing is permitted (dataset storage, vector databases, cached corpora, etc.).
verification_statusstring"unverified" or "domain-verified". Reflects the verification tier of the registration.
registered_atISO 8601When the registration was created. For domain-verified registrations, this is the domain verification timestamp.
checked_atISO 8601When this API response was generated. Log this alongside cached results as part of your compliance records.

Rate limits

Monthly query limits apply based on your subscription tier. Limits reset at the start of each calendar month (UTC). Current tier allowances are shown on our pricing page.

Rate limit headers

Every response includes the following headers:

HeaderDescription
X-RateLimit-LimitYour monthly query allowance.
X-RateLimit-RemainingQueries remaining this month.
X-RateLimit-UsedQueries used this month.

When a request is rejected due to quota exhaustion, a 429 response is returned with an additional X-RateLimit-Reset header containing the UTC timestamp when your allowance resets.

Error codes

Error responses follow a consistent format:

{
  "error": "Human-readable error message",
  "code": "MACHINE_READABLE_CODE"
}
HTTP statusCodeMeaning
400MISSING_PARAMETERNeither url nor hash was provided.
400INVALID_URLThe URL could not be parsed as a valid HTTP/HTTPS URL.
400INVALID_HASHHash is not a valid SHA-256 value (must be 64 lowercase hex characters).
400MISSING_URLSBulk request body missing or urls is not an array.
400EMPTY_URLSBulk request urls array is empty.
400BATCH_TOO_LARGEBulk request exceeds the URL limit for your tier.
400INVALID_JSONBulk request body is not valid JSON.
401INVALID_API_KEYAPI key is missing or not recognised.
403KEY_SUSPENDEDAPI key has been suspended. Contact contact@trainingdataregistry.org.
403TIER_NOT_ALLOWEDBulk endpoint is not available on your current tier.
429RATE_LIMIT_EXCEEDEDMonthly query allowance exhausted. Resets on the first of next month (UTC).
500INTERNAL_ERRORUnexpected server error. If this persists, contact contact@trainingdataregistry.org.

Code examples

curl

# Single URL check
curl -s -H "X-API-Key: tdr_live_your_key" \
  "https://trainingdataregistry.org/api/v1/check?url=https://example.com/article"

# Bulk check
curl -s -X POST \
  -H "X-API-Key: tdr_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{"urls":["https://example.com/article-1","https://example.com/article-2"]}' \
  "https://trainingdataregistry.org/api/v1/check/bulk"

JavaScript

const API_KEY = process.env.TDR_API_KEY;
const BASE_URL = 'https://trainingdataregistry.org/api/v1';

// Single check
async function checkUrl(url) {
  const response = await fetch(
    `${BASE_URL}/check?url=${encodeURIComponent(url)}`,
    { headers: { 'X-API-Key': API_KEY } }
  );
  return response.json();
}

// Bulk check
async function checkUrls(urls) {
  const response = await fetch(`${BASE_URL}/check/bulk`, {
    method: 'POST',
    headers: {
      'X-API-Key': API_KEY,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ urls }),
  });
  return response.json();
}

// Example usage
const result = await checkUrl('https://example.com/my-article');
if (result.registered && !result.allow_training) {
  console.log('Content is opted out of AI training');
}

Python

import os
import requests
from urllib.parse import urlencode

API_KEY = os.environ['TDR_API_KEY']
BASE_URL = 'https://trainingdataregistry.org/api/v1'
HEADERS = {'X-API-Key': API_KEY}

def check_url(url: str) -> dict:
    params = urlencode({'url': url})
    response = requests.get(f'{BASE_URL}/check?{params}', headers=HEADERS)
    response.raise_for_status()
    return response.json()

def check_urls(urls: list[str]) -> dict:
    response = requests.post(
        f'{BASE_URL}/check/bulk',
        headers={**HEADERS, 'Content-Type': 'application/json'},
        json={'urls': urls},
    )
    response.raise_for_status()
    return response.json()

# Example usage
result = check_url('https://example.com/my-article')
if result['registered'] and not result.get('allow_training', False):
    print('Content is opted out of AI training')

Best practices

Cache responses

Cache API responses for up to the period permitted by your tier (see API Terms). Log the checked_at timestamp alongside your cached results — this timestamp is your evidence of when you queried the registry relative to any content use.

Use bulk for datasets

When processing large datasets, use the bulk endpoint rather than looping over single checks. This is faster and uses the same number of queries against your monthly allowance.

Check before each training run

Preferences change over time — creators may register or withdraw URLs at any point. We recommend re-checking content against the registry before each training run or data ingestion cycle, rather than relying on a one-time historical check.

Respect both trust levels

Responses include a trust field: verified (domain ownership confirmed) or unverified (self-declared). While verified registrations carry higher evidential weight, we recommend giving appropriate weight to both — an unverified registration is still a documented expression of preference.

Corporate networks and VPNs

We monitor API keys for unusual usage patterns. If your team accesses the API from multiple office locations, a corporate VPN, or distributed infrastructure, requests may appear to originate from diverse network ranges. If your key is suspended and you believe this is the reason, contact contact@trainingdataregistry.org and we will review promptly.

Keep keys secure

Store your API key as an environment variable, never in source code or client-side applications. Do not share keys across teams or systems — create separate keys for each use case so you can revoke them independently if needed.

Questions?

For integration support or questions not covered here, contact contact@trainingdataregistry.org. API access is currently by application — apply here.