Demystifying Structured Data: The Journey from HTML to the Block Protocol

By

For decades, the web has primarily served human readers with content designed for visual consumption. However, beneath the surface of paragraphs and styling lies a missed opportunity: machines struggle to understand the meaning behind the text. This article explores why structured data matters, the challenges of adding it, and how a new solution—the Block Protocol—aims to bridge the gap between human-friendly publishing and machine-readable intelligence.

Why has the web historically lacked meaningful structure?

The web was born as a medium for sharing human-readable documents. HTML provides basic formatting—titles, paragraphs, emphasis—but rarely conveys the type of content. For instance, a book mention might simply be bold text with no indication that it's a publication with an author, illustrator, publisher, and ISBN. Early pioneers like Tim Berners-Lee envisioned a Semantic Web where computers could analyze this data, yet the dream stalled. The main reason: adding structured markup (like schema.org annotations) feels like homework after you've already published a polished post. It's technical, time-consuming, and offers no immediate benefit unless a machine is already parsing your page.

Demystifying Structured Data: The Journey from HTML to the Block Protocol
Source: www.joelonsoftware.com

What was Tim Berners-Lee's original vision for the Semantic Web?

In 1999, Tim Berners-Lee wrote about a web where computers could understand the content, links, and transactions between people and machines. He imagined intelligent agents handling daily tasks—trade, bureaucracy, life management—by parsing structured data. To achieve this, publishers would need to embed semantic annotations using formats like RDF or JSON-LD. For example, instead of just displaying a book title, you'd mark it up to tell machines: 'This is a Book, authored by Margaret Wise Brown, published in 1947 by Harper & Brothers.' Unfortunately, this required extra effort that most creators never adopted, leaving the web largely unstructured—a place where human readability ruled, but machine comprehension remained elusive.

How does schema.org help, and why hasn't it become widespread?

Schema.org provides a shared vocabulary for adding structured data to web pages. It defines types like 'Book', 'Person', or 'Recipe' so you can annotate content consistently. However, using it is far from trivial. You must first research the correct schema, then embed it in your HTML using microdata, RDFa, or JSON-LD. Many content management systems (CMS) offer plugins, but for individual bloggers or small businesses, it's a daunting step after the creative work of writing. Without an immediate payoff—like better search results or automated processing—most people skip it. As a result, even decades after schema.org launched, structured markup remains rare in the wild.

What fundamental problem does the Block Protocol aim to solve?

The Block Protocol tackles the chicken-and-egg problem of structured data: why would creators add semantic markup if no machines are reading it, and why would machines parse it if no one creates it? It proposes making structured data a first-class citizen in content creation tools, not an afterthought. By embedding blocks that inherently carry semantic meaning—like a 'Book' block that automatically exposes author, ISBN, and genre—the protocol eliminates the extra homework. Creators get rich, interactive components; machines get predictable, parseable data. The goal is to align incentives: if building with blocks is faster and more feature-rich than raw HTML, people will adopt structured markup organically.

Demystifying Structured Data: The Journey from HTML to the Block Protocol
Source: www.joelonsoftware.com

Can you give an example of how the Block Protocol would change a typical web page?

Imagine you're writing a book review. Today, you'd format the title in bold, list details in plain text, and if you're diligent, add hidden JSON-LD. With the Block Protocol, you'd insert a 'Book Block' that asks for title, author, illustrator, publisher, ISBN, and cover image. This block renders the information beautifully on-screen and simultaneously exposes structured data to any machine visiting the page. A search engine could then instantly know the book's exact details. A reading list app could fetch it. An AI agent could compare prices or find related titles. The effort shifts from 'manually annotate after writing' to 'drop in a smart block while writing'—no extra steps required.

Why does human progress depend on solving the structured data challenge?

As we enter an era of AI assistants, smart agents, and automated workflows, the web must evolve from a document repository to a data platform. Structured data enables computers to answer questions, recommend products, aggregate facts, and perform tasks without human intervention. Without it, these systems rely on fragile natural language parsing or fall back to walled-garden APIs. The Block Protocol aims to unlock the web's latent potential, making every page both human-friendly and machine-readable. This shift could accelerate research, commerce, education, and daily life—just as Berners-Lee dreamed. The key is making structured data feel like a natural part of creation, not a chore.

Related Articles

Recommended

Discover More

Your Guide to Unpacking Tech Leaks: From Icon Redesigns to Wearable RumorsLubuntu Outshines Linux Mint on Nine-Year-Old Laptop, Changing Expert Recommendations8 Ways AI Coding Tools Are Overwhelming Code Review (And How to Fix It)Progress Software Rushes Patch for Critical MOVEit Automation Authentication Bypass VulnerabilityGermany's Rise as Europe's Cyber Extortion Hotspot: Key Questions Answered