Semantics In, Popularity Out


Guest post by Dr. Riza C Berkan, CEO

We congratulate Powerset for their launch. Some people must have a gun pointed at their heads to rush to a conclusion using one or two examples. Powerset is good, powerset is bad, etc. Well, I think they all are missing the point. So much for encouragement!

The clear message is this: Semantic technology is here, and will evolve to challenge and eventually push out the popularity based search methods. Here are the main reasons why:

1- DYNAMIC CONTENT
Dynamic Web pages and news articles move with such a fast pace that there is no time to collect any kind of statistics (link referrals) for popularity algorithms to do their job. By the time such referrals are made these pages become “history”. Thus, the only means to analyze them is via semantic algorithms that are not depending on statistics collection.

2- LONG-TAIL
A recent study shows that the average Web page has 474 words, and 41 links, 10 of which are pointed outside the domain. Any linguist would confirm that there can be 1000 queries that can be asked to a Web page of 474 words. If only 10 links are pointing out on the average, that means 99% of the meaningful word sequences (queries) are not wrapped around links to point out to any Web site. That is what creates the long-tail “relevancy” problem. There is so much valuable information left out using a popularity method. We, at hakia, call it “the hidden failure”. Semantic algorithms does not depend on statistics collection, thus are the only means to tackle the long-tail problem.

3- USER INTERACTION
The current generation of Web searchers are accustomed to use the pigeon-keyword language. But the average length of a Web query is on the rise. That means elevated expectations, problem solving, communication with (more like) natural languages. Eventually, people would love to talk back and forth to a search engine pretending to be Mr. Spock. None of these can be handled by popularity algorithms. We need semantic systems to understand text and speech.

4- CREDIBILITY
Search results that are ranked by popularity algorithms are destined to be commercially-biased. I am not talking about those “sponsored links.” If you are suffering from back-pain, you may have to sift through popular results about massage parlors, spas, and mud baths, before you encounter a credible source. With semantic technology, credibility of a source is not compromised by the ranking algorithm. It can be controlled to the full extent by expert advice.

5- ADVERTISEMENT ACCURACY
As a suitcase producer, you don’t want your ads to be pushed next to a murder story where the body was disposed using a suitcase. Content understanding is essential in on-line advertising, and can only be delivered by semantic advertising systems in a consistent basis.

At hakia, we call the combination of all these 5 points as the Quality search, as opposed to Popularity search. Quality is the new perspective for the consumers who had never been exposed to it until recently, and the semantic technology is the enabling force behind it.

It is no longer a big secret that all existing search players are also looking into the semantic technology. The question at this point is how good and comprehensive these technological developments are. It is just a matter of time until the consumers decide the winner and silence all those shot guns. Of course, when the tide changes, we may see roses popping out from their barrels.

For those who are interested, I have written about what takes to test a semantic search engine properly. It requires at least couple of hundreds of queries specially crafted to test the competency in various areas. Then, one can compare it with Google, provided they both have the same corpus to work on for the search queries. That’s how it is supposed to be done instead of a shot-gun approach.

Congratulations Powerset. Keep it coming.

2 Responses to “Semantics In, Popularity Out”

  1. gav Says:

    This is a really clear, useful argument in favour of semantics. They’ll add many more degrees of freedom to search in, which obviously results in better control over the returned results. It’s a top-down solution to information overload.

    However, I think all five points in this post & a couple more besides (like users being able to share their own understanding of what’s relevant!) can also be addressed by developing popularity-based searching within a more dynamic environment. For instance, by letting information develop context amongst other bits of information dynamically, the total information pool spontaneously self-organizes into smaller-scale chunks, and all five points are dealt with just the same.

    On oddflower for instance, the context of a page, news item or person on the site (i.e. its ’semantics’) emerges from the traffic flows on the site, so it’s completely bottom-up & potentially even more dynamic, democratic, high quality, credible and advertizing accurate. A bottom-up popularity approach like this should not be dismissed, because it does something that a top-down one can’t; it allows users to develop & share their ’semantic’ knowledge.

  2. Riza berkan Says:

    Within the semantically controlled (or identified) subsets of content, the answer is yes. It will add human flavor to it.

    Otherwise, (that is if popularity is used to identify content), then no, because it is an approximation by statistics, it will work for some cases, but not all cases where statistics is not there, like the long-tail, ect.

    The credibility measure is actually “popularity” by experts’ voting.

    Good points. Thanks.

Leave a Reply