The Long Tail’s Impact on Search Relevance

December 20th, 2007 by Charles S. Knight
Posted in Guest Authors | 3 Comments »



Guest Author
Melek Pulatkonak
President – Hakia
Search for Meaning

Long tail discussions in search mainly revolve around advertising, in particular around ways of using tail keywords to bolster ad campaign performance.  I would like to pose a different question today: What is the impact of the long tail to search relevance?

For starters, I would like to refresh your memory regarding the long tail statistics in search. The long tail is estimated to account for over 95% of the search volume. As far back as in 2001, see the Excite query distribution chart below, the area under the long tail comprised 97% of the search query volume.

The long tail phenomenon persists and is confirmed by the recent statements from large search players.  A Google spokesperson stated that “20 to 25% of the queries we see today, we have never seen before”and Ask.com’s CEO, Jim Lanzone, was quoted as saying “On any given day, 60% of the search requests we get, we have never seen before” . Jim gave a great presentation (“New search engine relies on power to find the best search results,” Associated Press, May 30, 2007) at the Web 2.0 conference in 2005 and I will share two data points with you:

o Long tail of searches  make up 95% of the queries

o Head query= 1.57 keywords. Tail query=5.01 keywords.

So far, we have established three facts: 1) The area under the long tail comprises more than 95% of the search volume; 2) Long tail queries are longer, unique and complex; 3) The majority of queries are as unique and complex as we are- complicated creatures with different information needs and use of language.
 
Now, let’s move on to the relevancy discussion.

Today’s search retrieval technology relies on popularity systems in one form or the other. The number of possible queries that can be asked to a search engine using three words or more is huge compared to the available statistical material (number of link referrals). Thus, popularity- based engines can augment only a tiny fraction of long queries and then only the most popular ones. Any long query referring to a slightly unpopular topic will never make use of enough votes to improve search relevance. 

It sounds like a conflict doesn’t it? Long tail is a phenomenon in search. The volume of long tail search queries makes up almost the full universe. Yet, today’s popularity search systems cannot improve relevance in the long tail due to a design-imposed constraint.  This is exactly what we observe today: current search engines satisfy most people for their most common and often shortest queries, most of the time. 50% of the searches go unanswered.

The answer to the question, “what is the impact of the long tail to search relevance?” is simple: To improve relevancy, try a new approach that moves away from popularity systems.

It is exciting to see so many young companies taking up the challenge to build for the future.

Alternative Search Engine GoLexa, Go!

December 20th, 2007 by Charles S. Knight
Posted in Reviews, Updates | No Comments »





Here’s some neat Golexa information:

Links to the top bookmarking sites:

del.icio.us    Digg it    Furl     Google    StumbleUpon    YahooMyWeb  Reddit

Uses SnapShot technology from Snap.com for the thumbnail preview of each site.

Site:www.yoursite.com search operator if you want to analyze pages from a particular domain.

Provides Search Suggestions for popular queries.

Approximately 33 different site/page analysis tools for each result.

AlexaRank and Traffic Stats from www.Alexa.com

Speed Report from www.websiteoptimization.com

Archived copies from www.Archive.org

12 different site reports from www.urltrends.com

Page Rank from www.digpagerank.com

Whois and other site data from http://centralops.net/co/DomainDossier.aspx

Site Report from http://toolbar.netcraft.com

Cached pages, Similar pages, Text only pages, Link Alerts, Linked Pages and Site search from Google

Adsense Sandbox from http://www.labnol.org/google-adsense-sandbox

IP, KW Suggestions, Header, Domain Stats, Spider Simulation and KW cloud from www.webconfs.com

Page Monitoring from www.changedetection.com

Anchor Text from www.submitexpress.com

Translation service from http://world.altavista.com

Content Extraction from www.zafel.com

View Source Code from http://web-sniffer.net

W3C Validation Check from http://validator.w3.org

Short url service from www.TinyUrl.com

Link Extraction from www.webmaster-toolkit.com

Number of pages, Inbound Links and Site Tracking from http://siteexplorer.search.yahoo.com

 

Open Web Awards First Finalists – Vote Now!

December 20th, 2007 by Charles S. Knight
Posted in Uncategorized | No Comments »

One of the guiding principles of AltSearchEngines is that we are here to promote the success of the Alternative Search Engines.  Well, Search Engine of the Year Quintura and Top 100 People Search Engine Wink are going all out for the Open Web Awards.  If you feel that they are the best choices (see below), please vote for them now! 


To vote now, just click here.







The economical path towards the future search engine

December 20th, 2007 by Charles S. Knight
Posted in Guest Authors | 6 Comments »

Today’s first Guest Author is Alex Ginzburg of the NooTag Team

At the present, the Search Engines market is one of the few fields on the Internet that has been proven to be very profitable. This fact encourages hundreds of entrepreneurs to challenge Google, which currently enjoys the income almost exclusively. In the process, a debate has been raised regarding the nature of the innovation that will create a worthy alternative. For instance, there is an ongoing argument between those who believe that the answer will be found in a revolutionary search algorithm and those who are confident that a new interface will do the job. There is no consensus even among the followers of each approach. Though, there is one thing in common to most of them: they all seek for a technological solution. However, I’m not sure that their choice suits the natural direction of search engines’ development. I’d like to present an alternative approach, an economical evolution of this field.

Today, Google dominates the search market, almost solely, due to its economic strength. It keeps an army of the smartest people available, by providing them the most desirable workplace on earth (7). It pays billions to companies like Dell (1), Mozilla (2), Adobe and others just to spread its search service through their computers and programs. It spends hundreds of millions of dollars on dozens of non-profitable products. The lunching of those free services is used to obtain positive PR and to maintain Google’s reputation as a generous and innovative company. Under those circumstances it’s difficult to imagine how anyone can overthrow Google from its leading position. It’s not enough to provide search results that are a bit better than Google’s, since people won’t change their habits just because some new SE manages to present a satisfactory result on the third raw instead of the fifth. On the other hand, in most cases competitors won’t be able to gain a significant advantage over Google and even if they will, it probably won’t last long due to Google’s economical strength. The conclusion is that Google won’t be significantly challenged as long as it maintains its solid financial position. Therefore, in order to predict how the SE market will evolve, Google’s financial future needs to be discussed first. But before we do that, we must understand what makes Google so economically strong in the first place.

Google can be seen as an information mediator that supplies its audience the desirable content and in exchange enjoys the income from advertisements that are presented along with the content. This function is fundamentally similar to what happens in other kinds of media, like radio and TV stations, newspapers and magazine publications. However, as opposed to others, Google enjoys a phenomenal profitability (above 50% gross margin). The reason is simple; as opposed to other kinds of media, Google doesn’t pay for the content that it supplies. While radio stations pay royalties to the song writers, TV networks purchase or produce programs; Google harvests the web pages on the Internet for free. Those extra funds are used to fortify its monopolistic position and they are the source of its economical power. Hence, this power will keep exiting as long as Google doesn’t have to pay for the content that it presents in its search results.

There are very few reasons that may cause Google to start paying for the products it sales. First of all there are the legislators, which may change the legal status of the content on the Internet so that any commercial usage, like the one done by the search engines, would entitle its creator compensation. However, this scenario is extremely unlikely to happen. Another option is that the content producers themselves, at least the key ones, would charge search engines for benefiting of their content (the right to show it in the search results). Let’s take Wikipedia for instance. Currently it is referred by Google about 1.67 billion times per month (5), meaning that Wikipedia provides service to about 5% of all Google’s users (3). Since Google’s income generated by advertisements, it depends entirely on its number of visitors. Therefore, Wikipedia generates about 800$ millions income for Google every year (6). If Wikipedia demanded just a quarter of a percent of that amount, its income would exceeded the 1.7$ million collected with donations to finance its operation (4). Even if the demand was ten or twenty times higher, it would still be accepted, since no search engine would ever disregard Wikipedia or any other major content producer in its search results. Even though Wikipedia is a non-profit organization, it could use the extra funds to expand and improve its free services. It may take a while for this scenario to happen, since it will be realistic only after the content producers will realize that they are essential to the search engines and not vise versa. The third and the most likely reason to weaken Google’s financial status would be competition. We’ve already established that it’s extremely difficult to compete with Google as long as you offer nothing but the same search results ordered or presented in a different manner. Consequently the natural course for a competitor would be to look for a unique and valuable content. The only way to prevent Google from getting this content is by making it exclusive. Therefore, we should expect that the next battle of the search market war would occur over the content. If only one major competitor offers the content producers a reward for an exclusive usage, the whole market would change and the new equilibrium would set on a point that every search engine, including Google, would have to pay for high quality content just to prevent others from getting exclusivity.

This way or another, the profitability of the search market will decrease. As a result, Google won’t be able to maintain its dominance and only then the opportunity will emerge for the alternative search engines to present their merchandise. This will eventually lead to a competitive market that is much more efficient and innovative than it is today. Another benefit of this development would be the stream of income to the content producers that would definitely improve the quality of the content on the Internet and would drive it from amateurism (like YouTube and Wikipedia) to professionalism (like Hollywood and the Academy).

(1) http://www.forbes.com/2006/05/25/google-dell-0525markets15.html

(2) http://www.informationweek.com/news/showArticle.jhtml?articleID=181501852

(3) http://www.telegraph.co.uk/connected/main.jhtml?xml=/connected/2007/10/11/dlgoogle111.xml

(4) http://fundraising.wikimedia.org/en/fundcore/list?page=1233

(5) http://leuksman.com/log/2007/06/07/wikimedia-page-views/

(6) http://finance.yahoo.com/q/is?s=GOOG

(7) http://money.cnn.com/magazines/fortune/bestcompanies/2007/

Alex Ginzburg is a co-founder of the NooTag initiative. For more information visit their blog at http://nootag.com/blog