The Race to Shape the Semantic Web – Score One Microsoft

logo1

Brooke Aker, CEO of Expert System USA

The world wide web is going semantic. But what does that mean? A semantic web is an advanced version of the current web. It is a place where information about the information is more than a URL, a title, an author, a date and some keywords that get picked up by the large search engines. A semantic web is a web that has metadata about every paragraph, sentence and word embedded behind the page.

With the extra metadata, search engines will be able to find, understand and deliver web pages about cancer in the context of zodiac signs and skip the web pages with cancer in the context of out-of-control cell growth. Won’t that be a relief? No more sorting the pages you want from ones that don’t matter.

A semantic web can distinguish between events where GE sues GM versus where GM sues GE. Then the semantic web can count, tally, follow, infer, model and analyze how GE is suing GM over time and keep that separate from the other lawsuit where GM is suing GE. All from the written record sprinkled across our newspapers, legal documents, blogs and other written forms of information.

There are many who see how powerful and productive this semantic web is becoming. Tim Berners-Lee, inventor of the web, and now the primary proponent of the semantic web, saw the value in a semantic web over 10 years ago when he and the W3C began to architect it. But now we are at a turning point.

The Semantic Web Race

As recently as 1 year ago, you could find Marissa Mayer of Google saying that the most successful search engine didn’t need semantics. After all, Google had copies of every page on the web combined with a ton of statistics about those pages and user behavior to tweak out even better search results. This is no longer the case. In late May, Google announced support for one of the standards the semantic web is built on called RDFa.

So Google is jumping straight into the deep end of the pool. And because of their size and influence in the industry, Google has the power to sway lots of technologists who are on the fence. But there is something amiss in Google’s semantic strategy. Their website in support of RDFa instructs web masters worldwide how to manually add the correct tags in the construction of each web page.

Even a quick look at this shows how ridiculous it is. There are many estimates of the size of the internet. Conservative figures go something like this: The number of pages on the internet is about 48 billion, with 1 billion news pages added per day. An average web page is about 600 words long. The average time to read that page is 2.7 minutes. If someone had to also add RDFa markup tags while reading that page we double the time to 5.3 minutes.

So it would take 177 million work days to go back through every existing page on the internet and manually add the correct tags to each web page. But let’s forget the old web pages. To cover the 1 billion new web pages per day would take 3.7 million work days – every single day. There are roughly 15 million IT workers worldwide – I don’t know the number of web masters within this. But let’s assume all the IT pros in the world would love to help mark up the new web pages with RDFa. The 3.7 million work days per day would represent nearly a 25% increase in every IT professional’s daily workload.

Google’s business model and the source of its riches is what business men and women have longed for since time began. The cost of their raw materials – the web pages that make up the internet – costs them nothing. They take them for free. Now they want to ask the worlds’ IT professionals to do the labor that will greatly enhance the Google product for free. What a business!

I don’t really think Google expects the worlds’ IT professionals to work 25% harder for free. What is probably going on is that Google had the RDFa specification partly finished without any processing power behind it but felt compelled to respond to a competitive event. On June 3, Microsoft launched its long awaited new search engine named Bing. Under the hood is the companies’ acquisition of semantic technology provider Powerset from the previous year.

Just as Google talked up Google Squared as a response to the release of WolframAlpha, Google quickly talked up RDFa as a response to what they knew was coming in Bing. But score one for Microsoft on the semantic web front. Any truly good technology is baked, invisible and useful. Saying you support RDFa and advising web masters to plug their pages with your standard is not the same as embedding semantics as part of the technology that collects, indexes, understands and reveals that understanding to web users everywhere. As big and as powerful as Google is, you must conclude that such a flat-footed response suggests Microsoft is up to something big and it scares them.

By the way, look for Google to get a semantic inference engine in gear to compete directly with Microsoft. They know as well as anyone that automation is the key to making a go of it in a semantic web world.

Doing some calculation on the automation side suggests the following. At Expert System our semantic inference engine, Cogito, would take a tad more than 1 second to process an average web page for RDFa. To keep up with the 1 billion new web pages per day, it would take Cogito about 730 thousand work days to process. While that sounds like a lot, there 75 million servers working worldwide around the clock. So in the end, we are only talking about adding about 0.97% to the current server capacity in order to make the transition to a semantic web. Less than 1%.

This is a small price to pay so we can have a web that finds dots, connects dots and understands dots instead of giving us a long list of results.

The Shape of the Semantic Web

There are some important additional clues from both Google and Microsoft about the shape of the semantic web beyond the announcement war.

Google has picked the buying experience for consumers as a first foray into the semantic web. The RDFa formats they have set to process include reviews, people, products and businesses or organizations. The specifications for reviews are instructive. Product names, author, dates, the review itself and then a rating are part of the markup that goes behind a consumer’s comments or opinions.

Together these specifications provide a broad approach to enhancing one important social aspect of the web – that of consumer feedback about products and services that another consumer can find, see, compare and contrast. To see an existing example of this from Expert System try this site that automatically understands the sentiment of about 2000 automotive blogs posts per day and graphically shows you the results. You can try this site hereà http://www.cogitomonitor.com/demo

Microsoft Bing takes a similar approach, but the semantics are baked in already. They have a shopping section to their site. The moment you search for a specific product, you are presented with a graphical outcome of existing owners’ sentiment about the product or service. This is semantics at work. Microsoft doesn’t have a small army of reviewers behind the scenes looking for the reviews, deciding what they mean and assigning a score. The semantics do this, just like in the auto sentiment site above.

So for both Google and Microsoft, it should not be a surprise that smart shopping would be the first place for semantics on the web to show up. The web is so broad and vast and caters to consumers, with ads financing the whole thing. So if you make it better, you can sell more of the product and more ads.

But Microsoft has a vision beyond the obvious shopping improvements. Today, travel agents are few and far between. Instead, you do it yourself on the web and it’s time consuming. Microsoft has intelligently improved this – mostly with structured data on fare trends, seat loads, etc. But semantics come into play when you are looking for something to do once at your destination. Semantics categorize the search results into attractions, maps, weather, tourism, rentals, etc. It helps the user ratchet through the results quickly and efficiently. This feature is beyond first generation faceted search since the contents of each web page is contextually examined and placed in one or more categories.

At Expert System, we do something similar and then take it a step further. A full blown semantic web understands the context of the search you are conducting and then extends it to include the same concepts under different keywords. In other words, the machine is smart and brings back all the material whether you know all the keyword variations for a concept or not.

The screen below shows this. We semantically indexed every page from recovery.gov and when I search on the keyword “money,” I get back “money” but also “amount,” “cash” and more you cannot see. It is easy to see that “cash” is an alternative to “money.” It is “amount” that is most interesting. You can see in the context of the sentences shown “amount” infers “money” but it is not a common usage of the word. Except in government, you are likely to find entire documents with “amount” and never the word “money” since it is implied and government loves nothing more than shorthands and acronyms.

focus-gov-1

focus-gov-2

There is a health section on Bing and it follows pretty much the categorization routine of the travel section with some additional features. Semantics finds informative articles, advice and other information that helps those with health issues, and pushes them to the top. It pushes down those with a product to sell, those with some form of advocacy, etc. Semantics is needed to distinguish between these kinds of content and provides a real service.

A section called Local uses semantics to reveal the sentiment on restaurants just like they do for products and services. But here it’s all combined with directions, maps and is localized by nothing more than backtracking the location of your IP address. Smart and helpful.

The Semantic Web to Come

Google and Microsoft have started to inject semantics into the consumer web. By comparison there has been much more done for private corporations or organizations, albeit on a smaller scale, as we at Expert System have done for hundreds of corporations, organizations and agencies around the world. But these experiences have a valuable application on the web.

Think about a web that answers questions directly and accurately – about the hours at the library, instructions for home repairs, or how to operate one of your complicated electronic devices. Think about a web that monitors and prevents fraud or helps solve crime, protects the homeland, saves energy, and discovers new medicine.

These and many more ideas will come about in the not so distant future. The semantic web will make it so – a web smart and connected, a web that can infer, understand, predict and warn. Economists like to talk about how efficient free markets are so long as information about a market is free, flowing and most importantly, complete. The semantic web pushes us much closer to the ideal of complete information.

Leave a Reply