Revisiting the Semantic Web: How Structured Data Can Finally Become Mainstream

The Web's Original Promise and Its Limits

Since the early days of the World Wide Web in the 1990s, the platform has primarily served as a medium for publishing documents meant for human eyes. Those documents are built with HTML, a language that offers a basic level of structure—like marking a section as a paragraph or emphasizing a word with italics. Then designers add CSS to make things visually appealing, such as turning paragraphs into tiny gray sans-serif text. While that might look stylish to some, it can alienate older readers who struggle with low contrast and small fonts.

Revisiting the Semantic Web: How Structured Data Can Finally Become Mainstream — Source: www.joelonsoftware.com

But this kind of “structure” is shallow. It tells a web browser how to display content, not what the content means. For example, if someone mentions a book on a page:

Goodnight Moon by Margaret Wise Brown
Illustrated by Clement Hurd
Harper & Brothers, 1947
ISBN 0-06-443017-0

A simple computer program scanning that page might not even recognize it as a book reference. The only hint is a bold title, but that's not enough for a machine to understand the context. This gap—between what humans read and what computers can interpret—has been a known limitation since the web's inception.

The Vision of a Semantic Web

As early as 1999, Tim Berners-Lee, the inventor of the web, articulated a dream for a “Semantic Web.” In his book Weaving the Web, he wrote:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.”

The idea was to embed rich, machine-readable metadata alongside human-readable content. If you publish information about a book, you wouldn't just rely on bold text. You would use structured data formats—like RDF or JSON-LD—and reference a shared vocabulary from schema.org to explicitly say “this is a book, these are its authors, this is its ISBN.” Computers could then automatically extract, link, and reason about that information.

What Makes It So Hard?

Despite the vision being decades old, widespread adoption of semantic markup has been slow. The main obstacle is effort. Adding structured data to a web page is, frankly, homework. Once a blogger finishes writing a human-readable post and presses publish, it's draining to then figure out the extra markup that makes the page computer-friendly. Unless a search engine or another service is already actively consuming that data, most people simply don't bother. As a result, genuinely rich structured data remains rare on the open web.

This is a lost opportunity, because human progress depends on making information more accessible—not just to people, but to the algorithms and applications that can aggregate, compare, and analyze data at scale.

A Path Forward: Making Semantic Markup Effortless

To change this, we need a fundamental shift in how we approach semantic markup. The key insight is simple: people will only add structured data to their pages if the process is easy, intuitive, and built directly into their workflow. Pulling in a separate vocabulary, learning serialization formats, and manually embedding JSON-LD blocks is too high a barrier for most content creators.

What if, instead, the tools we already use—content management systems, blogging platforms, and web frameworks—automatically generated semantic markup from the content we provide? For instance, when a writer uploads a book cover image and fills in fields like title, author, and ISBN, the system could produce the appropriate schema.org markup in the background. The writer never needs to see a line of JSON-LD or RDF.

Another promising direction is the Block Protocol, which aims to standardize how structured data blocks work within web applications. By creating reusable, interoperable blocks that both humans and machines can understand, we can embed semantics without requiring manual coding. This approach could make the Semantic Web vision finally practical.

We believe that with the right tools and a focus on developer and writer experience, we can bring the dream of the Semantic Web to life—one effortless block at a time.

Revisiting the Semantic Web: How Structured Data Can Finally Become Mainstream

The Web's Original Promise and Its Limits

The Vision of a Semantic Web

What Makes It So Hard?

A Path Forward: Making Semantic Markup Effortless

Related Articles

Recommended

Discover More