Authentication
All API requests must include your API key in the X-API-Key request header. API keys are issued through your dashboard.
GET /api/v1/check?url=https://example.com/article
X-API-Key: tdr_live_your_api_key_hereAPI keys follow the format tdr_live_ followed by 32 hexadecimal characters. Keys are shown only once at creation — store yours securely. Do not embed keys in client-side code or public repositories.
Single check
GET /api/v1/check
Check whether a URL or content hash is registered in the Training Data Registry. Supply either url or hash — not both.
Query parameters
| Parameter | Type | Description |
|---|---|---|
| url | string | The URL to check. Must be a valid HTTP or HTTPS URL. Required if hash is not provided. |
| hash | string | SHA-256 content hash to check (64 lowercase hex characters). Required if url is not provided. |
Example requests
# Check by URL
GET /api/v1/check?url=https://example.com/my-article
X-API-Key: tdr_live_...
# Check by content hash
GET /api/v1/check?hash=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-API-Key: tdr_live_...URL normalisation
URLs are normalised before lookup: schemes are lowercased, trailing slashes removed, and fragments stripped. URLs without a scheme are assumed to be HTTPS. You do not need to pre-normalise URLs before sending them.
Bulk check
POST /api/v1/check/bulk
Check multiple URLs in a single request. Available on paid tiers only. Each URL in the batch counts as one query against your monthly allowance.
Tier limits
| Tier | Bulk access | Max URLs per request |
|---|---|---|
| Free | Not available | — |
| Pro | Available | 100 |
| Enterprise | Available | 1,000 |
Request body
POST /api/v1/check/bulk
X-API-Key: tdr_live_...
Content-Type: application/json
{
"urls": [
"https://example.com/article-1",
"https://example.com/article-2",
"https://blog.example.com/post-3"
]
}Response
{
"success": true,
"stats": {
"total": 3,
"checked": 3,
"registered": 1,
"not_registered": 2,
"errors": 0,
"processing_time_ms": 142
},
"results": [
{
"url": "https://example.com/article-1",
"registered": true,
"source": "domain",
"trust": "verified",
"domain": "example.com",
"allow_training": false,
"allow_inference": false,
"allow_archive": false,
"verification_status": "domain-verified",
"registered_at": "2026-01-15T10:00:00.000Z"
},
{
"url": "https://example.com/article-2",
"registered": false
},
{
"url": "https://blog.example.com/post-3",
"registered": false
}
],
"rate_limit": {
"limit": 10000,
"used": 3,
"remaining": 9997
},
"checked_at": "2026-02-26T12:00:00.000Z"
}If any URLs fail validation, they appear in an errors array alongside the results array. Results maintain the same order as the input array.
Response format
Not registered
{
"registered": false,
"checked_at": "2026-02-26T12:00:00.000Z"
}Registered — URL-level
{
"registered": true,
"source": "url",
"trust": "verified",
"url": "https://example.com/specific-article",
"registration_id": "reg_abc123",
"allow_training": false,
"allow_inference": true,
"allow_archive": false,
"verification_status": "domain-verified",
"registered_at": "2026-01-15T10:00:00.000Z",
"checked_at": "2026-02-26T12:00:00.000Z"
}Registered — domain-level
{
"registered": true,
"source": "domain",
"trust": "verified",
"domain": "example.com",
"allow_training": false,
"allow_inference": false,
"allow_archive": false,
"verification_status": "domain-verified",
"registered_at": "2026-01-10T09:00:00.000Z",
"checked_at": "2026-02-26T12:00:00.000Z"
}Response fields
| Field | Type | Description |
|---|---|---|
| registered | boolean | Whether the content is registered in the registry. |
| source | "url" | "domain" | Whether the match came from a URL-level registration or a domain-wide registration. |
| trust | "verified" | "unverified" | verified — the registrant has proven domain ownership. unverified — a self-declared registration with no domain proof. Weight your compliance decisions accordingly. |
| allow_training | boolean | Whether use for AI model training is permitted (pre-training, fine-tuning, RLHF, distillation, etc.). |
| allow_inference | boolean | Whether ephemeral processing for inference outputs is permitted (summarisation, translation, Q&A, etc.). |
| allow_archive | boolean | Whether long-term storage or indexing is permitted (dataset storage, vector databases, cached corpora, etc.). |
| verification_status | string | "unverified" or "domain-verified". Reflects the verification tier of the registration. |
| registered_at | ISO 8601 | When the registration was created. For domain-verified registrations, this is the domain verification timestamp. |
| checked_at | ISO 8601 | When this API response was generated. Log this alongside cached results as part of your compliance records. |
Rate limits
Monthly query limits apply based on your subscription tier. Limits reset at the start of each calendar month (UTC). Current tier allowances are shown on our pricing page.
Rate limit headers
Every response includes the following headers:
| Header | Description |
|---|---|
| X-RateLimit-Limit | Your monthly query allowance. |
| X-RateLimit-Remaining | Queries remaining this month. |
| X-RateLimit-Used | Queries used this month. |
When a request is rejected due to quota exhaustion, a 429 response is returned with an additional X-RateLimit-Reset header containing the UTC timestamp when your allowance resets.
Error codes
Error responses follow a consistent format:
{
"error": "Human-readable error message",
"code": "MACHINE_READABLE_CODE"
}| HTTP status | Code | Meaning |
|---|---|---|
| 400 | MISSING_PARAMETER | Neither url nor hash was provided. |
| 400 | INVALID_URL | The URL could not be parsed as a valid HTTP/HTTPS URL. |
| 400 | INVALID_HASH | Hash is not a valid SHA-256 value (must be 64 lowercase hex characters). |
| 400 | MISSING_URLS | Bulk request body missing or urls is not an array. |
| 400 | EMPTY_URLS | Bulk request urls array is empty. |
| 400 | BATCH_TOO_LARGE | Bulk request exceeds the URL limit for your tier. |
| 400 | INVALID_JSON | Bulk request body is not valid JSON. |
| 401 | INVALID_API_KEY | API key is missing or not recognised. |
| 403 | KEY_SUSPENDED | API key has been suspended. Contact contact@trainingdataregistry.org. |
| 403 | TIER_NOT_ALLOWED | Bulk endpoint is not available on your current tier. |
| 429 | RATE_LIMIT_EXCEEDED | Monthly query allowance exhausted. Resets on the first of next month (UTC). |
| 500 | INTERNAL_ERROR | Unexpected server error. If this persists, contact contact@trainingdataregistry.org. |
Code examples
curl
# Single URL check
curl -s -H "X-API-Key: tdr_live_your_key" \
"https://trainingdataregistry.org/api/v1/check?url=https://example.com/article"
# Bulk check
curl -s -X POST \
-H "X-API-Key: tdr_live_your_key" \
-H "Content-Type: application/json" \
-d '{"urls":["https://example.com/article-1","https://example.com/article-2"]}' \
"https://trainingdataregistry.org/api/v1/check/bulk"JavaScript
const API_KEY = process.env.TDR_API_KEY;
const BASE_URL = 'https://trainingdataregistry.org/api/v1';
// Single check
async function checkUrl(url) {
const response = await fetch(
`${BASE_URL}/check?url=${encodeURIComponent(url)}`,
{ headers: { 'X-API-Key': API_KEY } }
);
return response.json();
}
// Bulk check
async function checkUrls(urls) {
const response = await fetch(`${BASE_URL}/check/bulk`, {
method: 'POST',
headers: {
'X-API-Key': API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({ urls }),
});
return response.json();
}
// Example usage
const result = await checkUrl('https://example.com/my-article');
if (result.registered && !result.allow_training) {
console.log('Content is opted out of AI training');
}Python
import os
import requests
from urllib.parse import urlencode
API_KEY = os.environ['TDR_API_KEY']
BASE_URL = 'https://trainingdataregistry.org/api/v1'
HEADERS = {'X-API-Key': API_KEY}
def check_url(url: str) -> dict:
params = urlencode({'url': url})
response = requests.get(f'{BASE_URL}/check?{params}', headers=HEADERS)
response.raise_for_status()
return response.json()
def check_urls(urls: list[str]) -> dict:
response = requests.post(
f'{BASE_URL}/check/bulk',
headers={**HEADERS, 'Content-Type': 'application/json'},
json={'urls': urls},
)
response.raise_for_status()
return response.json()
# Example usage
result = check_url('https://example.com/my-article')
if result['registered'] and not result.get('allow_training', False):
print('Content is opted out of AI training')Best practices
Cache responses
Cache API responses for up to the period permitted by your tier (see API Terms). Log the checked_at timestamp alongside your cached results — this timestamp is your evidence of when you queried the registry relative to any content use.
Use bulk for datasets
When processing large datasets, use the bulk endpoint rather than looping over single checks. This is faster and uses the same number of queries against your monthly allowance.
Check before each training run
Preferences change over time — creators may register or withdraw URLs at any point. We recommend re-checking content against the registry before each training run or data ingestion cycle, rather than relying on a one-time historical check.
Respect both trust levels
Responses include a trust field: verified (domain ownership confirmed) or unverified (self-declared). While verified registrations carry higher evidential weight, we recommend giving appropriate weight to both — an unverified registration is still a documented expression of preference.
Corporate networks and VPNs
We monitor API keys for unusual usage patterns. If your team accesses the API from multiple office locations, a corporate VPN, or distributed infrastructure, requests may appear to originate from diverse network ranges. If your key is suspended and you believe this is the reason, contact contact@trainingdataregistry.org and we will review promptly.
Keep keys secure
Store your API key as an environment variable, never in source code or client-side applications. Do not share keys across teams or systems — create separate keys for each use case so you can revoke them independently if needed.
Questions?
For integration support or questions not covered here, contact contact@trainingdataregistry.org. API access is currently by application — apply here.
