The Prediction Market Search Engine (pmia)

November 20th, 2007 by Charles S. Knight
Posted in Alts | 1 Comment »



Editor’s note:  If someone understands what this Prediction Market search engine does, would you be so kind to leave a comment and explain it to me? Thanks!

Editor’s note: Another reader to the rescue! 
Please see Alex’s helpful explanation below.

London (Oct 22 2007) – The recent Prediction Market Summit held in London, UK, concluded with the creation of an international industry association tasked with promoting awareness, education and validation for the rapidly growing field.

“It’s essentially a search that is circumscribed to all the prediction market sites on the internet. As I understand it, it is limited to sites which actually host prediction markets, as opposed to sites that talk about prediction markets (such as UsableMarkets, or MidasOracle).

It’s helpful, of course, if you understand what a prediction market is. Wikipedia has a good description. In short, it’s a betting market about the outcomes of future events, with the prices of specific outcomes (Hillary is the next US president) being equal to the probability of that outcome occurring. If the contract was priced at 25 cents, then the market believes that that outcome has a 25% chance of occurring.

What would be really powerful, but what PMIA search doesn’t do yet, is if you could search for a specific contract – Hillary wins, for example – across all markets which have such a contract, and compare their predictions (or prices). That is something that doesn’t currently exist (although Slate groups together some prediction for political markets), but that people have been looking for.”

-alex

Debate: Arabic / English Search

November 20th, 2007 by Charles S. Knight
Posted in Debates | 9 Comments »



Every Tuesday night on AltSearchEngines, we invite two vertical search engines to discuss the similarities and differences with their projects.  Tonight is one of the most interesting ones that we have ever had - the exotic (to me) world of Arabic / English search engines featuring two very respected search engines: Onkosh and Tayait.


1) For all of our readers who know English but not Arabic, please summarize the challenges that you have faced in building an Arabic / English search engine and how you solved them.

In order to build a search engine, you must first give your system the ability to understand the relationship between various different factors – specifically, the relationships between individual words and the overall structure of the language.

A computer’s ability to understand and process a language is based on a Natural Language Processing (NLP) – and while many languages have been comprehensively understood through NLP technology, Arabic is one of the few major languages left on earth in which major headway is still being made till today.

It is the nature and complexity within the actual language itself that has caused this delay in its development. There are two major hurdles that need to be surpassed when it comes to the application of NLP technology to Arabic:

Ambiguity: Arabic is what we call a highly inflected language i.e. many words have an incredible number of synonyms as well as derivatives, and hence during search a single keyword may have many different meanings and interpretations

Absence of Vowels: Unlike English, where the vowels in the language are letters themselves, Arabic vowels are accents placed above and below the letters themselves (known as diacritical marks). This in and of itself is not a problem, but in written Arabic these vowels are almost never used – certainly not on the Internet. The pronunciation of the word is gleaned through the structure of the sentence and the inferred meaning of the word, which is learnt from childhood for native speakers. Obviously this lack of vowels in written Arabic increases the problem of ambiguity even more.

Both of these realities about the Arabic language create a level of intricacy that as humans comfortable with the language, we have no problem understanding – but when it must be reduced to a rule-based system creates a level of complexity that is very difficult to process.

In order to solve these problems we partnered with an international company that has dedicated part of its R & D to Arabic language processing for the last 15 years, and brought in several of our own NLP specialists to develop our own NLP module, tama (taya Arabic Morphological Analyzer). Using tama we can choose to search all the inflections, derivatives, and all possible synonyms of the word – this alongside the ability to search for a term in English and return all the Arabic results that are related to that term.

The main challenge lies in the fact that the Arabic language is much more complex than the Romanized-based languages.

The Arabic language has a complex script and rules. Add to that, there are invisible characters (called diacritics or ‘tash.keel’) which can alter the way the word is pronounced, and can have a completely different meaning and root—compared to English, a quite ‘straight forward’ language.

Thus, it would be hard to judge relevancy of Arabic search results. In Onkosh, these issues are handled in a smart way that can practically maintain accuracy, and most importantly maintain search time/performance to much less than a second for 92% of the queries, even with high load of concurrent queries.

It is worth mentioning the fact that the ‘giant’ SE’s are mostly treating Arabic blindly (i.e. doing exact match). This does not help the user seeking relevant Arabic information, rather just exact-match results.

Another main challenge is auto-identifying the Arabic-related portion of the web (index coverage). Onkosh tries as much as possible to include not only the Arabic-language pages, but also the ‘Arabic-related’ content in other languages (especially English and French). Building a smart and Arabic-oriented crawler was in itself very challenging.

2) We (in the U.S.) hear a lot about the U.S. and Chinese markets (Baidu, etc.) Where would you place the Arabic market for Search in a global context? How large is it (how many users)? Is it growing, at what rate, etc.?

Well the easiest way to look at the Arabic market and compare it to other language specific markets around the world is by looking at the total number of Internet users that are currently online and use their historical growth rates to extrapolate how this number will continue to grow.

Historically the Chinese market has been considered one of the largest developing markets in the world – and within the last seven years (from the years 2000 – 2007) they have seen an approximate 720% increase in their total number of internet users – going up from 22.5 to 162 million users.  Japan – considered another developing market considered to have massive potential, has had an increase of 185% – going up from 47 to 87.5 million users online.

The interesting thing about the Arabic market – is that it cannot be divided into a single specific country – as it is a language spoken in varying degrees throughout an entire region of countries (including not only the Middle East – but North Africa as well). The North African region (specifically Morocco, Algeria, Egypt, and Libya) has seen a growth rate of almost 2500% in the last seven years in the total number of internet users.

The Middle East (including Jordan, Kuwait, Lebanon, Saudi Arabia, Syria, and the UAE) has seen another 700% increase. This comes out to an estimated 22 million users online today. Though this total number may not compare to that of China or Japan currently – this is a market with an incredible potential and we only expect the adoption and penetration rates of Internet technology to increase in the future.

[Same question]  A lake, small by nature, can be nothing within a huge ocean. This is the case for the Arabic web as part of the global Internet.

We can safely claim a few estimations based on a pseudo-scientific observation and analysis:

First: the Arabic web is currently somewhere in the size of 200-300 million pages only.

Second: the growth rate is very aggressive.

Third: a big portion of the old Arabic web is not SEO-friendly – this getting fixed as sites are added or revamped, since now SEM started to get the webmasters attention in the Middle East and North Africa arena (MENA), where our target audience primarily relies, and of course we target all the Arabs and Arab-speaking users around the globe mostly resident in the U.S., Great Britain, and Canada.

Referring to the Internet statistics website, at the time of writing this (Nov. 18, 2007), we have these growth rates in Internet penetration. The below analysis cohesively indicates parallel growth in the ‘related’ content over the years 2000-2007 (Please compare to total growth of the world which is 244.7%):

2647% growth in North Africa (stats calculated based on the top six Arab African countries: Egypt, Algeria, Morocco, Sudan, Tunisia, and Libya – Excel sheet attached).

Internet Usage Statistics for Africa

920% growth in Middle East: although the population is about 2.9%, the Internet penetration is growing much faster than the rest of the world. The Middle East is not limited to the Arab countries, but the general indicator does the job.

3) You both have the keyboard icon near your search box, could you explain how it is used, and also, for someone like me, can you explain the term “Bel-3araby ” to a non-Arabic user!

Here in Egypt, as in the rest of the Arabic speaking world, when you buy a keyboard, laptop, cell phone, etc – alongside the normal QWERTY keyboard we also have the Arabic alphabet printed (this is the extent to which the Arabic language is engrained in the community – the majority of SMS’s are even sent in Arabic on specially manufactured phones with Arabic letters printed on their keypads).

That being so, Tayait is not designed to be used only by users living within the Arabic world. We realize that there are a huge number of Arabic speakers throughout the globe – who would love to have access to Internet content in their native language – yet who may not have an Arabic keyboard. We, therefore, offer our online Arabic keyboard as a means through which users can quickly and effectively input search terms of their choice – whether they have an Arabic keyboard or not.

We realize though that sometimes using an online keyboard can be cumbersome for entering search terms – especially for power searchers who initiate many different queries at the same time to try and get the best result – this is why we have offered our Cross Language Functionality – offering our users the ability to use a normal English QWERTY keyboard and input the desired search term in plain English, while tama will take care of returning all the relevant Arabic search results to this English query.

[Same question] The keyboard usage is quiet simple. Once you click the icon, your Arabic queries become a few clicks away, even if your computer does not support Arabic at all. This even helps users who have difficulty or slowness in Arabic typing. Additionally, it is ideal for the Arabs living in the U.S. and other places where it is very rare to find Arabic keyboards. 

As for the feature “Bel-3araby”, this is first an Arabic word pronounced
as ‘bel-a’araby’ and means ‘in Arabic’. The feature is patent-pending,
and the term “Bel-3araby” itself is copyrighted to Onkosh. On that note,
I am very proud to say that I am a co-inventor, among five, of the
patent.

This feature enables you to use your Latin-based keyboard to write
Arabic words the way they are phonetically uttered using the Roman
characters and numerals that became very popular in people’s chats or
mobile messaging. In short, Bel-3araby is a transliteration service
from Romanized alpha-numerical input to Arabic output. It is an
intelligent service employing lots of careful heuristics and AI
techniques tailored from ground up to understand the Arabic user needs
(You may refer to this previous post.)

Many attempts are now in action to imitate “Bel-3araby”, after its
importance was recognized at Onkosh.

4) Are there positive things that you see in the other search engine in this debate? Are you competing for exactly the same users, or are there some differences in either a) Your objectives or b) Your approaches, that you could share -as far as you understand your debate partner?

The fact that there are other Arabic search engines out there in and of itself is a positive thing. We believe that Arabic speakers have been held back long enough from being able to effortlessly search the Internet without having to learn a second language – and in that pursuit its incredibly beneficial for all parties to have a variety of companies and people trying out different things in order to provide the best quality search results for Arabic Internet users all around the world. Though the specifics of Onkosh’s objectives are privy only to them, I think we both would agree that the Arabic Internet has incredible potential – if given the infrastructure to thrive. Offering search in Arabic is one of the most critical parts of that infrastructure – but there is much more to come.

We differentiate ourselves and our search results by not only using excellent search technology- but at the same time we have a team of individuals who make the effort of identifying the best quality Arabic websites for the most common search terms to be crawled – this in the hope to provide our users with the highest quality content on the Arabic Internet today. This means that we actively ensure that the most active blogs, news, Wikipedia, and a whole host of Arabic websites are always crawled and indexed – and we continue to add and develop our database daily using the best content we can find.

Yes definitely. They are doing an excellent job in integrating with Exalead, which I find a very good engine indeed. Tayait also have the Arabic synonym search, which is not yet public in our release. Their product Tama has been there for long indicating good performance and reliability.  Our audience overlaps for sure. I am not sure about their preferred segment. Since the Internet has no boundaries, Onkosh defines its audience as all those who use, understand, and/or are interested in the Arabic-related content around the globe.

5) I find both of your projects very impressive; are you the two dominant Arabic / English search engines, or are there others, perhaps less well known ones, that we should also know about that are also good?

Tayait and Onkosh have been in the news quite recently because we have both come out at around the same time in full force. But we haven’t come from a completely nonexistent industry – there are other companies that have been offering Arabic search for a while now, these firms include, but are not limited to:

Araby.com, Arabo.com, Ayna.com, Ajeeb.com, and 4arabs.com

Once again – we think it’s great that there are other like-minded individuals working to provide better services to the online Arabic speaking community.  

Where do you hope to be two years from now?

In the next two years we obviously plan on devoting considerable efforts towards constantly improving our results – as search is our no 1 priority. But it doesn’t stop there – we want to provide Arabic Internet users the same range and quality of services available on the English Internet and in the rest of the world. We know it’s a considerable goal – but it’s something we believe in and are willing to work towards.

There are attempts for building an Arabic search engine Two could succeed to build a good audience: araby.com and ayna.com. We are aware of some other projects that were announced but not yet released like Sawafi, whose latest news said it will be renamed and launched ‘soon.’ 

Expectation for 2010: Two years from now seems a long time span in the fast-moving SE industry and science. What I can safely promise your respectable readers, is that Onkosh will be aggressively enhancing its services over the coming few months. We aim to be the number one local search engine, and to be the most reliable engine for the Arabic-interested user in general. We are very optimistic, and we hope we can help build better Arabic web.

Onkosh has a mission to help the Arabic-speaking user to start sharing and using the Arabic language in search. We are comprehensive in our services, not only depending on the basic form of web search. Onkosh offers other distinguished search flavors including, but not limited to: news, blogs, forums, and files. Not to forget, Onkosh also has a family filter for safe search, in addition to Onkosh.mobi that brings the Arabic web to your handheld device. At Onkosh, we are cordially happy to see others recognize Onkosh as a ‘role model’ in the Arabic search. We believe we did a good job, and we have received a lot of positive feedback that keeps us motivated to even challenge ourselves and work around the clock on more quality features, and we will continue to raise the bar!

AltSearchEngines:  We owe a great debt of thanks to Hany at Onkosh and Noha at Tayait for all of the time and effort that they donated for this debate. If you found it useful, I hope you will print it or email it for anyone else that you think might benefit from this detailed discussion of Arabic search engines. [but please link back to this post]

Of course, I also encourage you to try both of them, Onkosh and Tayait, today, to see their great features for yourselves. If you have a question or comment for Onkosh or Tayait, please leave a comment and we will ask them to check back and respond as they have time.

Search with SortFix “Start Dragging Stop Typing”

November 20th, 2007 by Charles S. Knight
Posted in Reviews | 1 Comment »

SortFix is a young and innovative search technology company, devoted to enrich your search experience.
They’ve created an intuitive graphical interface that isn’t only cool and fun to use but also boosts your search skills and abilities.

AltSearchEngines:  Why did you develop Sortfix?

Sortfix: Our initial interest in search technology began as a basic user, we were frustrated by our inability to find needed information quickly and efficiently.  Sure, finding facts on a popular band or the newest mp3 player is easy, but trying to find an answer to a more complex question is usually not so simple.

We often couldn’t find exactly what we looked for, or gave-up in the middle. Not only that – when we thought that the search was going to be too complicated, we didn’t even try to look it up.

AltSearchEngines: So what is your solution?

Sortfix: We’ve developed a system that does all the hard work and leaves you only the easy part. Behind the scene an intelligent algorithm imitates a professional searcher – by scanning and examining the results, it reveals the significant keywords and terms that will help you to define a better question.  Then comes the best part, by using SortFix’s unique interface you can play with the suggested keywords, and create your own individual and precise query, and when you ask a precise question you usually get the right answer.

Note: Please watch the Demo movie!

AltSearchEngines:  What’s next for Sortfix?

Sortfix: The amount of information is ever growing, and efficient searching tools are a necessity like never before. We are here to provide it.  We are constantly working to improve our services – implementing new ideas and adding additional features to our advance users.  There is much more to come… ongoing ideas and improvements are processed on a daily basis. Stay with us, and together we will find the answer.

From ASE Italia: Songza, e tutto il mondo canta!

November 20th, 2007 by Charles S. Knight
Posted in Global | No Comments »

Songza è un invenzione del ventitreenne Aza Raskin, presidente della software company Humanized di Chicago e figlio del fondatore della Apple Macintosh Jef Raskin. Per un mese Aza ed il Web/Systems architect di Humanized Scott Robbin hanno lavorato durante i fine settimana per realizzare questo progetto. Bloggers e giornalisti ne hanno lodato l’elegante interfaccia utente, il design gradevole e le funzionalità di contorno.

Il primo obbiettivo di Songza è illustrare come gestire i contenuti usando un’ ”interfaccia a misura d’uomo” – il termine utilizzato da Jef Raskin per descrivere quelle interfacce che riflettono il modo in cui le persone utilizzano attualmente il software. Songza esplicita questo concetto attraverso il suo controllo remoto: pulito, ordinato e trasparente. Nuove funzionalità saranno aggiunte in futuro. A differenza di KaZaa o BitTorrent, gli utenti di Songza possono solamente ascoltare le canzoni, senza poterle scaricare sul proprio computer. Ma, a differenza di Last.fm o Rhapsody, Songza permette agli utenti di scegliere esattamente la canzone o l’artista che vogliono ascoltare, non richiedendo né di iscriversi né di pagare per questo servizio.  -Thanks Federico!

Tickex and NinjaTickets – meet SeatQuest

November 20th, 2007 by Charles S. Knight
Posted in Innovations, Newcomers | No Comments »

SeatQuest is dedicated to digitizing seating charts for concerts, theatre, and sporting events.  It is the first and only visual search engine for tickets.

SeatQuest’s mission is for you to “Know where you sit” before purchasing tickets for upcoming events by going beyond the usual seating charts.

The way SeatQuest works is by aggregating tickets on sale from its affiliate partners and conveniently displaying them on the event’s seating chart. 

I strongly suggest that you check them out today by clicking the “View Demo” link on the homepage! 

The secondary market for tickets in North America prides itself as being a fair market, in sharp contrast to the primary ticket market which is dominated by one or two companies offering nonspecific seats at artificially set ticket prices.  

This weekend in Charlottesville, VA is a very big college football game – the University of Virginia vs. Virginia Tech.  A “sold out” game where you have to get your tickets on eBay. 

I’m going to send this post to our good friends at Tickex and NinjaTickets and invite them to leave their impressions either as comments or additions to this post.