Lab integration runbook

Audience: Engineering lead at a lab estimating integration effort, followed by the engineers actually wiring this up.

Time to integrate (estimate): A single engineer, 1–2 days for the minimal path (single lookup endpoint + audit-log logging). Add 1 day for bulk lookup. Add 0.5 day for full client-side verification.

Prerequisites: Read the technical specification for the cryptographic model, audit-defensibility properties, and the production substrate the registry is built on.

1. The 30-second integration story

At training-data ingestion time, for each unique source domain in your pipeline, issue one HTTPS GET:

GET https://api.akaeon-registry.com/v1/lookup?domain=example-publisher.com
Authorization: Bearer <lab_api_key>

The response is a JSON bundle. Three things to do with it:

Log the entire bundle to your immutable audit store, keyed by (domain, training_run_id, lookup_at). The bundle is your defense against later challenge — log it verbatim.
If optouts is empty, the domain has no registered opt-outs; proceed with your training pipeline's default policy.
If optouts is non-empty, enforce per the policy your pipeline already implements. The registry doesn't tell you what to do; it tells you what the publisher said.

That's the minimum integration. You can ship this in a day. The rest of this document covers the optional-but-recommended layers: client-side cryptographic verification (so you can prove later that what you logged is genuine), bulk lookup (so you don't pay per-domain round-trip cost at scale), and the public verification surface (so a successor verifier can re-check your audit log without the registry's cooperation).

2. Authentication and credentialing

2.1 Getting credentials

To obtain a lab API key, contact integrations@akaeon.com. The credentialing flow:

Initial conversation to confirm scope (which models/training pipelines the key will be used for, expected QPS, jurisdictional considerations).
Lab-side signing of a one-page acceptable use document. This is not a license to data — the registry doesn't license data; it serves metadata about publisher preferences. The document covers non-redistribution of the API key, log-retention commitments, and rate-limit acknowledgment.
Issuance of a bearer token. The token is a 256-bit random value transmitted once over an encrypted channel and never again; the registry stores only an Argon2id hash. If you lose the token, it must be reissued (the registry cannot retrieve it).

2.2 Using the token

Every authenticated endpoint accepts the token in an Authorization header:

Authorization: Bearer akr_live_<32_random_alphanumeric_chars>

Tokens are scoped to one of two environments:

akr_test_... — staging environment, no cost for lookups, no rate-limit enforcement, no production data.
akr_live_... — production environment.

2.3 Token rotation

Rotate tokens by issuing a new one and revoking the old one. Both operations are available via POST /v1/keys/rotate (lab admin only, returns the new token in the response body). The old token continues to work for 30 days after the new one is issued, to allow staged rollout.

2.4 Webhook signature verification (optional, future)

For future endpoints that push to a lab webhook (e.g. "this domain you queried previously now has a new opt-out"), the registry HMAC-signs the request body with a per-lab webhook secret distinct from the API key. The header is X-Akaeon-Signature: sha256=<hex_hmac>. Verify with:

import crypto from 'node:crypto'

function verifyWebhook(rawBody, signatureHeader, webhookSecret) {
  const expected = 'sha256=' + crypto
    .createHmac('sha256', webhookSecret)
    .update(rawBody)
    .digest('hex')
  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(signatureHeader),
  )
}

This endpoint family is not yet live; the signature format is specified here so client code can be written against it now.

3. The lookup endpoint

3.1 Request

GET https://api.akaeon-registry.com/v1/lookup?domain=example-publisher.com
Authorization: Bearer akr_live_...
Accept: application/json

Query parameters:

| Param | Required | Description | |---|---|---| | domain | Yes | The domain to look up. Lowercase, punycode-encoded for IDNs, no trailing dot, no scheme, no port. | | include_withdrawn | No | true to include withdrawn opt-outs in the response. Default false. | | as_of | No | RFC 3339 timestamp. Returns the registry's state as of that time. Used by labs that want to reproduce a historical lookup. Default: current time. |

3.2 Response — domain with opt-outs

200 OK:

{
  "domain": "example-publisher.com",
  "lookup_at": "2026-05-12T09:14:00Z",
  "registry_version": "v1",
  "optouts": [
    {
      "submission_id": "01J9XW8ZQ7K3F0HXFZBV4QXRH3",
      "status": "anchored",
      "canonical_record": {
        "version": 1,
        "type": "domain_optout",
        "submission_id": "01J9XW8ZQ7K3F0HXFZBV4QXRH3",
        "domain": "example-publisher.com",
        "policy": "no-training",
        "scope": "domain",
        "effective_from": "2026-05-11T00:00:00Z",
        "submitted_at": "2026-05-11T14:23:00Z",
        "dns_verified_at": "2026-05-11T14:31:00Z",
        "dns_challenge_record_sha256": "9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08",
        "publisher_account_id": "01J9XW7TR8H4M2GXFZBV4QXRH0",
        "app": "akaeon-registry",
        "network": "arweave"
      },
      "registry_signature": {
        "canonical_message": "akaeon-registry:optout:v1|01J9XW8ZQ7K3F0HXFZBV4QXRH3|example-publisher.com|no-training",
        "signature": "MEUCIQDx4...base64...",
        "public_key": "VGhpcyBpcyBhIDMyLWJ5dGUgRWQyNTUxOSBwdWJsaWMga2V5",
        "signature_scheme": "ed25519",
        "version": "v1"
      },
      "merkle_inclusion": {
        "leaf_hash": "a3f5d8b9c2e1...",
        "leaf_index": 142,
        "tree_size": 2814,
        "merkle_proof": [
          "b4f1e2d8c7a9...",
          "c5a2f3e9d8b1...",
          "d6b3a4f1e2c8..."
        ],
        "merkle_root": "f3a9d8b4c2e1...",
        "tree_construction": "rfc6962-style"
      },
      "anchor": {
        "arweave_tx_id": "ABC123_arweave_tx_id_here_KXYZ",
        "arweave_url": "https://arweave.net/ABC123_arweave_tx_id_here_KXYZ",
        "arweave_block_height": 1234567,
        "arweave_block_timestamp": "2026-05-11T15:00:14Z",
        "anchored_at": "2026-05-11T15:00:00Z"
      }
    }
  ]
}

3.3 Response — domain with no opt-outs

200 OK:

{
  "domain": "example-publisher.com",
  "lookup_at": "2026-05-12T09:14:00Z",
  "registry_version": "v1",
  "optouts": [],
  "no_optouts_attestation": {
    "canonical_message": "akaeon-registry:no-optouts:v1|example-publisher.com|2026-05-12T09:14:00Z",
    "signature": "MEUCIQ...base64...",
    "public_key": "VGhpcy...",
    "signature_scheme": "ed25519",
    "version": "v1"
  }
}

The no_optouts_attestation is signed by the registry's key at the time of lookup. This is important for audit defensibility: it gives you a cryptographically-signed statement that as of lookup_at, the registry had no anchored opt-outs for this domain. Log this just as carefully as a positive opt-out — it's your defense if a publisher later claims they "already had" an opt-out at training time.

The no-opt-outs attestation is not Arweave-anchored (the cost would be prohibitive at lookup volume); it's an ephemeral registry signature. The trust property is weaker than for anchored opt-outs — a colluding registry could re-sign a fake "no opt-outs" attestation. The mitigation is the public verification surface (§7), which lets you cross-check that no opt-out for that domain was anchored before your lookup_at. For high-assurance training runs, run that cross-check; for ordinary ingestion, the attestation is sufficient.

3.4 Error responses

| Status | Code | Meaning | |---|---|---| | 400 | INVALID_DOMAIN | Domain failed validation (malformed, IP address, etc.). | | 401 | INVALID_TOKEN | Bearer token missing, malformed, or revoked. | | 403 | TOKEN_EXPIRED | Token expired; rotate. | | 429 | RATE_LIMITED | Per-token rate limit hit. Response includes Retry-After header. | | 503 | REGISTRY_DEGRADED | Registry is operational but downstream verification is degraded. Includes degraded_components array in body. |

The 503 case is rare and load-bearing: it signals that you should not trust the response's anchoring claims for high-assurance lookups. The response body still contains the registry's best-effort view; your pipeline decides whether to proceed or defer.

4. The bulk lookup endpoint

For ingestion pipelines that process millions of URLs per training run, per-domain HTTP requests are too slow. The bulk endpoint takes a list of domains in a single request:

4.1 Request

POST https://api.akaeon-registry.com/v1/lookup/bulk
Authorization: Bearer akr_live_...
Content-Type: application/json

{
  "domains": [
    "example-publisher.com",
    "another-publisher.com",
    "third-publisher.com"
  ],
  "include_withdrawn": false,
  "as_of": null
}

Maximum domains per request: 1,000. For more, paginate.

4.2 Response

{
  "lookup_at": "2026-05-12T09:14:00Z",
  "registry_version": "v1",
  "results": [
    {
      "domain": "example-publisher.com",
      "optouts": [ /* same structure as single-lookup */ ]
    },
    {
      "domain": "another-publisher.com",
      "optouts": []
    },
    {
      "domain": "third-publisher.com",
      "error": { "code": "INVALID_DOMAIN", "message": "..." }
    }
  ]
}

Notes:

Per-domain errors are surfaced in the error field on that domain's entry; the overall response is still 200 OK as long as the request itself was valid.
The no_optouts_attestation is not included per-domain in the bulk response (it would bloat the response). Instead, a single bulk_attestation is included at the response root, attesting that as of lookup_at, the registry had no anchored opt-outs for any domain in the result set with empty optouts. The signed canonical message is akaeon-registry:bulk-attestation:v1|<sha256-of-sorted-empty-domain-list>|<lookup_at>.

4.3 Performance characteristics

Target latency for a 1,000-domain bulk request: under 500ms p99. Bulk requests are cheaper than per-domain requests on rate-limit accounting (one request, not 1,000). Use bulk wherever you'd otherwise loop.

5. Client-side verification — the audit-defensibility code path

This is the optional-but-strongly-recommended step that converts the registry's records from "claim by a vendor" to "cryptographically-checkable fact." Run this code on the bundle you log; if it ever returns false, your audit log entry is suspect and you should not rely on it.

The code is ~30 lines, Node.js, standard library only. Equivalent implementations in Python, Go, and Rust are mechanically straightforward.

// audit-verify.mjs — verify a registry lookup bundle entry.
// Returns true iff signature, Merkle proof, and Arweave anchor all check.

import crypto from 'node:crypto'

const SPKI_HEADER = Buffer.from('302a300506032b6570032100', 'hex')

function ed25519VerifyRaw(message, signatureB64, publicKeyB64) {
  const pubkeyDer = Buffer.concat([SPKI_HEADER, Buffer.from(publicKeyB64, 'base64')])
  const pubKey = crypto.createPublicKey({ key: pubkeyDer, format: 'der', type: 'spki' })
  return crypto.verify(
    null,
    Buffer.from(message, 'utf8'),
    pubKey,
    Buffer.from(signatureB64, 'base64'),
  )
}

function sha256(buf) {
  return crypto.createHash('sha256').update(buf).digest()
}

function canonicalSerialize(obj) {
  // Canonical JSON: lexicographic key sort, no whitespace, UTF-8.
  const sorted = (v) => {
    if (Array.isArray(v)) return v.map(sorted)
    if (v && typeof v === 'object') {
      return Object.keys(v).sort().reduce((acc, k) => {
        acc[k] = sorted(v[k])
        return acc
      }, {})
    }
    return v
  }
  return Buffer.from(JSON.stringify(sorted(obj)), 'utf8')
}

function verifyMerkleInclusion(leafHashHex, leafIndex, treeSize, proofHexArray, rootHex) {
  // RFC 6962 §2.1.2 inclusion-proof verification.
  //
  // The registry's tree uses odd-count PROMOTION (the last node at an odd-
  // count level is promoted unchanged to the next level), NOT Bitcoin-style
  // duplication. A naive "shift right on each level" verifier produces wrong
  // intermediate hashes whenever the path passes through a promoted node,
  // so the verifier needs `treeSize` to know where the right edge of each
  // level sits.
  if (leafIndex < 0 || leafIndex >= treeSize) return false
  if (treeSize === 0) return false

  let fn = leafIndex
  let sn = treeSize - 1
  let r = Buffer.from(leafHashHex, 'hex')

  for (const siblingHex of proofHexArray) {
    const p = Buffer.from(siblingHex, 'hex')
    if (sn === 0) {
      // Path is exhausted but the proof still has entries — invalid.
      return false
    }
    if ((fn & 1) === 1 || fn === sn) {
      r = sha256(Buffer.concat([Buffer.from([0x01]), p, r]))
      while ((fn & 1) === 0) {
        fn >>= 1
        sn >>= 1
      }
    } else {
      r = sha256(Buffer.concat([Buffer.from([0x01]), r, p]))
    }
    fn >>= 1
    sn >>= 1
  }

  return sn === 0 && r.toString('hex') === rootHex
}

export async function verifyOptoutBundle(optout, arweaveFetchFn = fetch) {
  // Step 1 — Registry signature on the canonical record.
  const sig = optout.registry_signature
  const signatureOk = ed25519VerifyRaw(sig.canonical_message, sig.signature, sig.public_key)
  if (!signatureOk) return { ok: false, reason: 'registry signature invalid' }

  // Step 2 — Recompute the leaf hash from the canonical record and compare.
  const canonicalBytes = canonicalSerialize(optout.canonical_record)
  const recomputedLeafHash = sha256(
    Buffer.concat([Buffer.from([0x00]), canonicalBytes]),
  ).toString('hex')
  if (recomputedLeafHash !== optout.merkle_inclusion.leaf_hash) {
    return { ok: false, reason: 'leaf hash does not match canonical record' }
  }

  // Step 3 — Merkle inclusion proof rolls up to the claimed root.
  const merkleOk = verifyMerkleInclusion(
    optout.merkle_inclusion.leaf_hash,
    optout.merkle_inclusion.leaf_index,
    optout.merkle_inclusion.tree_size,
    optout.merkle_inclusion.merkle_proof,
    optout.merkle_inclusion.merkle_root,
  )
  if (!merkleOk) return { ok: false, reason: 'merkle proof does not reconstruct root' }

  // Step 4 — Arweave-anchored batch payload's merkle_root matches.
  const arResponse = await arweaveFetchFn(optout.anchor.arweave_url)
  const arBody = await arResponse.json()
  if (arBody.merkle_root_sha256_hex !== optout.merkle_inclusion.merkle_root) {
    return { ok: false, reason: 'arweave-anchored root does not match claimed root' }
  }

  // Step 5 — Arweave batch payload is itself signed by the registry's key.
  const batchSig = arBody.registry_signature
  const batchSigOk = ed25519VerifyRaw(
    batchSig.canonical_message,
    batchSig.signature,
    batchSig.public_key,
  )
  if (!batchSigOk) return { ok: false, reason: 'arweave batch payload signature invalid' }

  return { ok: true }
}

What this code does, by step:

Verify the registry's Ed25519 signature on the per-record canonical message.
Verify the canonical record actually hashes to the claimed leaf hash.
Verify the Merkle inclusion proof actually reconstructs the claimed root from the leaf.
Fetch the Arweave-anchored batch payload and verify the on-chain root matches the claimed root.
Verify the registry's signature on the on-chain batch payload itself.

If all five pass, the chain from publisher-stated opt-out to Arweave-anchored public timestamp is intact. If any one fails, the bundle is not audit-defensible and your pipeline should treat it as suspect.

Run this at lookup time if your pipeline has the latency budget. If not, run it async post-lookup against the logged bundle and alert on failures. Either way, run it before you'd need to defend the audit log in a deposition.

6. Audit log structure

Per defensible practice, the lab's audit log entry for each lookup contains everything the registry returned, verbatim, plus a few fields the lab adds. Recommended log row schema:

{
  "lookup_id": "<lab-internal-uuid>",
  "training_run_id": "<lab-internal-identifier>",
  "queried_domain": "example-publisher.com",
  "lookup_at": "2026-05-12T09:14:00Z",
  "registry_response": { /* verbatim response body from the registry */ },
  "client_verification": {
    "verified_at": "2026-05-12T09:14:00.342Z",
    "verifier_version": "akaeon-verify-1.0.0",
    "result": "ok",
    "per_optout_results": [
      { "submission_id": "01J9XW...", "ok": true }
    ]
  },
  "training_pipeline_decision": {
    "decision": "exclude",
    "policy_version": "lab-policy-v3"
  }
}

Storage recommendations:

Append-only. No row should ever be updated or deleted. Use an immutable storage backend (S3 with object lock, BigQuery without delete permissions, an append-only Kafka topic feeding into long-term storage).
Cryptographic chain (optional). Hash each row including the prior row's hash, producing a hash chain. This protects against later tampering of the log itself. The registry doesn't require this; some labs find it strengthens their internal audit story.
Retention. Retain for the longer of: training run lifetime + 7 years, or the maximum statute of limitations for relevant copyright / AI-Act claims in your jurisdiction.
Backup. Mirror to a second region. The audit log is the lab's defense; losing it is losing the defense.

The lab's audit log is not stored by the registry. The lab is responsible for its own logging discipline; the registry's records and signatures are what makes that discipline meaningful.

7. The public verification surface

These endpoints require no authentication. Use them when you want to cross-check the registry's claims without depending on your own API key (useful for spot-audits, for offline verification, and for verifying records in deposed audit logs from a successor).

| Endpoint | Returns | |---|---| | GET /v1/public/optouts/:submission_id/verify | Same shape as a lookup-bundle entry, served unauthenticated. | | GET /v1/public/batches/:batch_id | Batch metadata, Merkle root, leaf count, Arweave tx id. Used to cross-check that a claimed root matches the registry's view. | | GET /v1/public/registry-key | Current and historical Ed25519 public keys with validity windows. | | GET /v1/public/registry-key/history | Arweave transaction ids of the key catalog records, so a verifier can recover the historical keys from Arweave directly without trusting the registry. |

The registry's commitment is that these endpoints remain available for as long as the registry operates, and that the data they return is mirrored to at least one independent secondary domain (initial proposal: mirror.akaeon-registry.com, different hosting provider, read-only).

For the strictest audit posture: verify every claim against Arweave directly, treating the registry's public verify endpoint as a convenience-only mirror. Every claim the registry makes is committed on-chain; the registry's continued cooperation is convenient but not strictly necessary.

8. SDKs and reference implementations

8.1 Officially-maintained

The registry team commits to maintaining and supporting:

@akaeon/registry-client (TypeScript / Node) — thin HTTP client with built-in client-side verification. Wraps the endpoints in §3, §4, §7. Returns typed responses. Includes the verifier from §5 as a reusable function. Available on npm under MIT license. (Status: in development; will publish at v0.1 alongside the registry v1 launch.)
akaeon-registry (Python) — equivalent client. Type hints under Python 3.10+, supports httpx and requests as backends. (Status: in development.)

These are deliberately thin clients — the registry's protocol is HTTP+JSON and there's no benefit to a thick SDK. The verifier code (§5) is the load-bearing logic; everything else is convenience wrapping.

8.2 Reference implementations

The verifier is reference-implemented in:

Node.js — see §5 of this document, plus the technical specification for the cryptographic specification.
Python, Go, Rust — to be published alongside v1 launch. The cryptographic specification is sufficient to implement against; we expect community contributions to provide ports.

8.3 Self-implementation

For labs that prefer to implement against the spec rather than depend on an SDK: the technical specification §5 contains everything needed. The verifier code in §5 of this document is the canonical reference. If your implementation passes the test vectors (test vectors will be published at v1 launch), it is conformant.

The registry does not require lab implementations to be open-source or certified. The verifier code is small enough to audit by reading.

9. Rate limits, error handling, and operational concerns

9.1 Rate limits (initial proposal; final at v1 launch)

| Endpoint | Per-token limit | Notes | |---|---|---| | GET /v1/lookup | 100/s, 1,000/min, 100,000/day | Burst tolerant; soft-throttles at 80% with a X-Akaeon-RateLimit-Remaining header. | | POST /v1/lookup/bulk | 10/s, 100/min, 10,000/day | Each request can include up to 1,000 domains. Effective domain QPS: ~10,000/s. | | GET /v1/public/* | 50/s per IP, no daily cap | Unauthenticated; protected by per-IP limit and Cloudflare-like DDoS protection. |

For higher limits, contact the registry team. Production labs with multi-million-domain training runs are expected to negotiate a higher tier.

9.2 Retry semantics

All GET endpoints are safe to retry. The registry returns deterministic results within a lookup_at second — retrying within one second produces the same bundle (subject to clock skew). For POST /v1/lookup/bulk, retry on 5xx with idempotency: include an Idempotency-Key header to deduplicate within a 24-hour window.

| Status | Retry policy | |---|---| | 200 | Success; don't retry. | | 4xx (except 429) | Don't retry — the request is malformed and will keep failing. | | 429 | Retry after Retry-After seconds (typically 1–60). | | 5xx | Retry with exponential backoff (initial 1s, max 30s, max 5 attempts). |

9.3 Degraded modes

If the registry's downstream verifier or Arweave gateway is degraded, the lookup endpoint returns 503 REGISTRY_DEGRADED with a body indicating which components are degraded. The response includes the registry's best-effort view (DB records, but with reduced confidence in anchoring claims). Labs decide whether to:

Proceed with the best-effort view (acceptable for low-assurance training runs).
Defer the lookup and retry later (acceptable for high-assurance training runs with a queue-based ingestion pipeline).
Fail closed and treat the domain as if it had an opt-out (the most conservative posture; appropriate for jurisdictions with strict TDM liability).

The registry does not prescribe a policy; the response gives you the information to choose.

9.4 Latency expectations

Single GET /v1/lookup: p50 50ms, p99 200ms.
Bulk lookup of 1,000 domains: p50 200ms, p99 500ms.
Public verify endpoint: p50 100ms, p99 400ms (additional Arweave-gateway latency may apply for the arweave_url round-trip).
Latency does not include the lab-side Arweave fetch in the client verifier (§5 step 4). Budget another 200ms–2s for that; cache on the lab side.

9.5 Caching

The registry serves all responses with Cache-Control: private, max-age=60 to discourage proxy caching of authenticated responses (responses are not idempotent across lookup_at second boundaries, and proxy caching would muddy the audit timestamp).

The lab can locally cache lookup results for a configurable duration appropriate to its pipeline — but the cached entry's lookup_at is the audit timestamp, not the time the cached entry was used. The registry recommends caching only within a single training-run boundary and re-fetching at the start of the next run.

10. Integration checklist

For an engineering lead estimating effort, here is the concrete checklist.

10.1 Day 1 — Minimum viable integration (one engineer)

Obtain an akr_test_... token from the registry team.
Wire GET /v1/lookup into your ingestion pipeline at the point where the source domain is known.
Log the verbatim response body to your audit store, keyed by (domain, training_run_id, lookup_at).
Implement the policy decision (what to do when optouts is non-empty). The registry doesn't dictate this — your existing compliance policy does.
Smoke-test against staging with a list of 100 real domains and a list of 100 known-opted-out test domains (the registry will provide test domains).
Confirm rate-limit headers are surfaced in your monitoring.

Estimated effort: 1 day for a competent engineer familiar with the ingestion pipeline.

10.2 Day 2 — Production hardening

Switch to akr_live_... token. Store the secret in your secret manager.
Add the client verifier (§5) to your pipeline. Run it at lookup time if latency allows; otherwise run it async-post-lookup against the logged bundle.
Alert on verifier failures. A verifier failure means either a registry bug or a tampering attempt; both are worth waking someone up.
Add error-handling for 429 (rate-limited) and 503 (degraded). Make the degraded-mode policy explicit and reviewable.
Replicate audit log to a second region.

Estimated effort: 1 day.

10.3 Day 3 — Bulk + scale

If your domain count per training run exceeds ~10,000, switch from GET /v1/lookup (per-domain) to POST /v1/lookup/bulk. Implement pagination if you exceed 1,000 domains per request.
Implement client-side caching scoped to the training run. Cache keyed on (domain, training_run_id); do not share across runs.
Add a load test against staging at your target QPS. Confirm X-Akaeon-RateLimit-Remaining doesn't reach zero.

Estimated effort: 0.5–1 day.

10.4 Audit-readiness — engineering-lead-level review

Audit log schema reviewed by compliance and signed off.
Retention period confirmed against the maximum applicable statute of limitations.
Verifier failure runbook documented (who responds, what they check, escalation path).
Public verification surface (§7) bookmarked by compliance. A compliance engineer can demonstrate ad-hoc verification of any audit log entry without the lab's API key.
Registry's wind-down protocol acknowledged. Where does the lab's audit defense come from if the registry stops operating? Answer: from the Arweave-anchored records, which survive the registry.

11. Frequently asked engineering questions

Q: Is there a webhook for "a domain I previously queried now has an opt-out"?

Not in v1. The model is pull, not push. Re-query at the start of each training run. A future webhook is plausible for labs with very long training runs; talk to the registry team if your use case requires it.

Q: Can I bulk-export the entire registry?

The registry will publish a daily snapshot manifest (Arweave-anchored list of all anchored batches as of midnight UTC) at v1 launch. The manifest doesn't replace the lookup endpoint — it's an offline-friendly audit aid, not a substitute for at-ingestion verification.

Q: Do I need to verify the Arweave anchor at lookup time?

Recommended but not strictly required. The audit-defensibility argument gets stronger the closer to lookup time you do the verification. The strongest posture is to verify in-process at lookup; the most pragmatic is to verify async-post-lookup and alert on failures. The weakest posture (trust the registry's response without verifying) is still defensible against most challenges but doesn't survive a registry compromise.

Q: What if a publisher submits an opt-out and the DNS challenge takes days to propagate?

The publisher's submission is pending_dns_verification until the challenge is observed. Your GET /v1/lookup does not see the submission during that window. The opt-out is effective from the dns_verified_at timestamp the registry records (which is also the moment the canonical record is built and signed). A publisher who intends an earlier effective date can specify effective_from, but the audit-credible time is the Arweave block timestamp.

Q: Are there test vectors I can run against?

At v1 launch, yes. The repository will include a test-vectors/ directory with canonical inputs (canonical record JSON, leaf hash, expected Merkle path, expected signature) and expected outputs. Your verifier passes if it agrees with the test vectors.

Q: What does the registry know about my lab beyond the token?

The IP address of the request (logged for rate limiting), the user-agent (logged for SDK adoption tracking), the queried domain, and the timestamp. The registry does not log per-domain decisions you make downstream; the registry doesn't know what you do with the response.

Q: What if the registry returns an opt-out and we trained on the content anyway — does the registry tell on us?

No. The registry has no relationship with the publisher beyond their submission. The registry does not crawl your training output, monitor your models, or send abuse complaints. Enforcement of the opt-out is your lab's compliance decision; the registry's role ends at delivering the cryptographic evidence.

Q: Is the registry on-chain end-to-end?

No. The registry is a normal HTTP service backed by Postgres for state and Arweave for anchoring. "On-chain" applies to the records' Merkle roots, which are anchored. Treating the registry as a fully on-chain protocol would add complexity without adding trust beyond what Arweave-anchoring already provides.

Q: What if Arweave goes away?

The substrate is designed to migrate. Existing records remain verifiable against existing Arweave transactions for as long as Arweave or any mirror of it exists. Future records can be anchored to a different substrate by changing the network field in the canonical payload — the verifier code (§5) is substrate-agnostic; it just needs the right gateway URL.

Q: How do I report a bug or request a feature?

engineering@akaeon.com for production issues. feedback@akaeon.com for feature requests. The registry maintains a public changelog documenting every API change.