
Don’t Train On Me—A Practical Way to Attach Permissions to Your Digital Content
We’ve hit a weird moment on the internet. You publish a blog post, pour in thought and craft, and—surprise!—some Large Language Model slurps it up for training. You didn’t consent. Reading and sharing? Great. Turning your words into model food? Different game.
Quick note up front: yes, this uses a tiny bit of blockchain—but only as a tamper‑evident receipt, not as a lifestyle. No wallets for readers, no crypto detours, just a durable anchor for your policy.
The problem
The web’s default is “public = fair game.” Opt‑out signals exist—robots.txt, meta tags, “noai”—but they’re scattered, inconsistent, and easy to ignore. When your content gets copied, mirrored, republished, or shared everywhere, your intentions and license rarely travel with it. Even when they do, there’s no tamper‑evident record tying your policy to the exact version of your words.
What if we tried this?
Instead of hoping platforms “do the right thing,” attach your rules to the work itself—and make those rules travel. Do you think this would work?
- Publish with a clear policy: “read freely, share with attribution, no LLM training,” plus license details.
- Bind the policy to the exact content at the moment of publication.
- Keep that policy visible and machine‑readable wherever the link or file goes.
- Preserve a transparent history of edits and shares so changes are obvious.
Sound reasonable? Pick your explainer:
- Option A: the human‑friendly version—plain language, no jargon.
- Option B: the technical deep‑dive—hashes, IPFS, and a tiny on‑chain anchor.
Quick note on novelty
Pieces of this exist today (opt‑out signals, provenance manifests, on‑chain attestations). What’s new here is linking them end‑to‑end for digital publishing so the policy is cryptographically bound to the content, versioned, and propagated across shares—a cohesive “policy + provenance” layer that travels with your work.
Option A: The human‑friendly explainer
Think of it as a “policy passport” for your content.
- Policy manifest: Every piece ships with a short, plain‑English notice (plus a machine‑readable JSON twin) stating what’s allowed: sharing, remixing, monetization, AI training, etc.
- Portable signals: Pages include standard metadata (JSON‑LD, Open Graph, robots/meta tags). Platforms and crawlers actually read these, so your policy shows up in link previews and discovery.
- Visible badge: A tiny badge says “Policy: No LLM training” (or whatever you set) and points to a public, immutable record of your terms.
- Version history: Edit the content? The badge updates, but the old version and its policy remain visible—no quiet switch‑ups.
- Shares: Use the share button and your endorsement gets recorded. Paste the URL anywhere and your policy still travels via preview cards and metadata.
Outcome: Clarity for readers, signals for good‑faith platforms, and a public, timestamped trail you can reference if someone crosses the line.
Option B: The technical deep‑dive
Same idea—now with receipts.
- Content + policy hashing: At publish, hash the canonical content and its policy manifest. That fingerprint says, “these exact words and these exact terms.”
- Content‑addressed storage: Put content and policy in IPFS/Arweave and get CIDs—stable IDs derived from the data itself.
- On‑chain anchor: Write a minimal record on a low‑cost chain (e.g., Base or Polygon) that includes content ID, content hash, policy hash, author identity (wallet/DID), and timestamp. Edits append; nothing overwrites.
- Propagation: Embed the policy pointer (CID + tx hash) in HTML metadata, JSON‑LD, and link previews so the policy travels with URL shares.
- Shares and mentions: Signed shares (via your site’s share button) become on‑chain attestations. Plain URL shares are discovered off‑chain (backlinks, webmentions) and batch‑anchored via Merkle roots for a verifiable distribution trail.
- Verification widget: A small badge resolves the content/policy pair and shows “Verified,” version, and policy summary in the UI.
Outcome: A durable audit trail that ties your work to your terms, with cryptographic authorship and timestamps. Not a force field—but now there’s a public record and portable signals that responsible actors can honor.
Beyond blogs: the same pattern for books, music, art, and video
This approach is media‑agnostic. You attach a policy, bind it to the exact work, and make it travel.
- Books and long‑form articles: Hash the EPUB/PDF or chapter text; include a policy page inside the file plus a machine‑readable manifest. Add TDM‑reservation signals on your site. Each edition becomes a new version in the audit trail.
- Music: Hash the master file and metadata; embed the policy pointer in ID3/XMP tags; store stems with content addressing. Streaming pages display the badge; remixes attest to the original track and its terms.
- Digital art and images: Embed a signed provenance manifest (C2PA‑style) with origin, edits, and the policy pointer directly in the image. The on‑chain anchor ties the asset hash to policy. Galleries surface the badge; derivatives reference the source.
- Video: Hash the final file and a mezzanine proxy; include content credentials and policy in the manifest; publish pages carry JSON‑LD. Clips and reposts can append attestations, building a chain of custody.
Pros and cons
Pros
- Clear intent, everywhere: Your policy rides along in previews, metadata, and embedded file tags—even on naked URL pastes.
- Tamper‑evident: Hashes + anchors make edits and policy changes visible.
- Versioned history: Every edit and share is traceable; no silent revisions.
- Interoperable: Works with web standards and emerging media credentials; optional Web3 adds durable receipts.
Cons
- Not DRM: Determined scrapers can ignore you; this is deterrence and documentation, not a hard lock.
- Setup overhead: A publish step, a badge, and (for deep dives) storage + a tiny contract.
- Metadata stripping: Some platforms remove embedded tags; you counter with on‑page signals and public manifests.
Summary
Publishing shouldn’t mean surrendering control. Attach a clear policy to each work, bind it to the content, and make those rules travel wherever your words, sounds, or images go. Start with portable web signals and a visible badge. If you’re building a platform, add content addressing and a tiny on‑chain anchor for verifiable provenance. It won’t stop every bad actor, but it raises the bar, guides good actors, and gives you solid evidence when they don’t.
A quick closer
I wanted to share the concept while I explore a proof‑of‑concept. Blockchain isn’t my home turf, so building and testing the pipeline may take a minute—but I think the architecture is sound, practical, and worth the effort.