
Dr. Erwin Stegentritt
LexiQuo, online since April 2006, is a multi-lingual meta search engine based on a linguistic processing of the user‘s request. The service is offered by Media-Novia GmbH located in Saarbrücken, Germany, in cooperation with my company, TEXTEC Software, which is the developer of the linguistic engine.
Depending on your language, „standard“ search engines very often do not find some word variants.
Searching in English, the absence of word variants is not a real problem because the (morphological) system of the English language is quite regular. The only major differences are between US English and British English.
But for German, this drawback can be more severe because German has many(morphological) variants of a given word. An adjective like „gut“ (good) has the following word forms: gut, gute, guten, gutem, guter, gutes, bessere, besseren, besserem, besserer, besseres, beste, besten, bestem, bester, bestes.
A noun, like „Haus“ (house) has the following forms: Haus, Hause, Hauses, Häuser, Häusern.
Furthermore, in German there are some specific characters (like in most non-English languages): The „Umlaut“ – characters in the list above: Häuser, Häusern, which can also be represented as „Haeuser“, Haeusern“.
Last but not least, there is the phenomenon of compound words (word groups written in one single word without blank). Some of them are lexicalized, which means they are listed in lexicons, but every German speaker can create new ones ad hoc, like „presidential election(s)“: Präsidentschaftswahl, Präsidentschaftswahlen. A new compound can even be part of an even longer compound word: Präsidentschaftswahlentscheidung
„outcome of the presidential election“). But the concept within this compound word can also be expressed in another way (using a prepositional structure), in which the sequence of word parts is inversed: „Entscheidung der Präsidentschaftswahl“).
Several years ago, we had in Germany a reform of orthography, and now different official spellings exist side by side (sometimes in the same newspaper). For instance, the word for dolphin can be spelled as „Delphin“ (old version) or „Delfin“ (new version).
These are some problems which can only be solved by a linguistic analysis of the indexed text data (impossible if you don’t have access to the index) or by adding all different spellings of the search terms to the request, in combination with some logical OR and AND.
This is the core idea of LexiQuo: add all these variants to the user‘s request. The only setting needed is the source language. This is the language in which the request is expressed. Depending on some linguistic information gained during the analysis process, like word category, number of words, etc. more or less variants are added or not to the original request. But the user can ask at any time for all variants or no variants at all, by pressing the appropriate buttons, thus having a perfect control of the extensions to be added to the request.
![]()
If the request consists of an adjective and a noun, like „greek island“, LexiQuo is able to generate all combinations of these 2 words and define a phrasal search. In English, this is not very impressive, because you will get only „greek island“ and „greek islands“. But in German the request would be „griechische Insel“, and the combinations of all variants of the adjective (griechisch=greek) and the noun (Insel=island) is: “griechische insel” OR “griechische Insel” OR “griechische Inseln” OR “griechischem Insel” OR “griechischem Inseln” OR “griechischen Insel” OR “griechischen Inseln” OR “griechischer Insel” OR “griechischer Inseln” OR “griechisches Insel” OR “griechisches Inseln” There are some combinations, which are not correct groups (like griechisches Insel), but as you can imagine, these incorrect groups will also get some hits.
Snapshot with phrasal search „griechische Inseln“ in Yahoo:

More examples
Let’s take the above mentioned differences between US and British English as an example: A search with „harmonization“ or „harmonisation“ will find a different number of hits in Google.com: More than 2,1 million for harmonization and 2,9 million for harmonisation. Even the regular plural forms (harmonizations and harmonisations) produce different result hits.
LexiQuo, instead, generates the 2 different spellings plus the plural forms, puts them into the appropriate request structure and sends this enriched request to the search engine. The user can enter any form of harmonisation (harmonisation, harmonisations, harmonization, harmonizations) and LexiQuo generates the same request (with different results, if harmonisation or harmaonization is the first or second word within the request).
There is no real advantage to get 2,9 million instead of 2,1 million hits. The effect is only sensible if you search for very specific terms, which lead only to a small number of hits.
We have also to admit, that since 2006 the big search engines have made progress – they are now able to identify words variants, even if the variant is an irregular form, like German: „Museum“ and „Museen“ (plural).
In the case of compound words, they can identify cooccurrences of the compound parts (Restaurantempfehlungen = Restaurant + Empfehlung), but only if they are not separated by other words. If the compound parts are separated by other words, they are not recognized, e.g. „Eine Empfehlung von Asiago das thailändische Restaurant Napalai in Duisburg… (a recommendation of Asiago the Thai restaurant Napalai in Duisburg…“).
If you search with less frequent words or word forms, it makes a difference, if the request undergoes a (better) linguistic treatment or not.
Here are some examples in German, which get much better results with LexiQuo than with the stand-alone search engine – the gain in quality is sometimes a better ranking of the results.
· Lithochoro Restaurantempfehlungen (restaurant recommendations for Lithochoro)
· geheimes Leben Jesu Kaschmir russischer Autor (secret life of Jesu in Kashmir russian author)
· Baßtubakonzert von Vaughan Williams (bass tuba concert of Vaughan Williams)
· Lernprogramme Grammatik Deutsch Orthographie (Learning programs German Orthography)
· Schneekatastrophen Dacheinstürze (snow disasters roof collapses)
· Reiterprogrammierung Javascript (programming of tabs in Javascript)
Multi-Lingual Search
Our world and especially the Internet is neither national nor mono-lingual: therefore, LexiQuo covers several languages and allows the translation of the search terms into other languages. This is not a translation process like Systran’s or other machine translation systems (translating a given text from a source language into a text of the target language), it is simply a word by word translation. This translation process does not return the translation of the request as a sentence, but only the translations of the isolated words of the request (which is much simpler).
The result of the translation is not added automatically to the request, but instead it is displayed to the user. He/she can add the invidual terms and has to define how these terms should be attached to the original request by selecting the OR or AND operators.
A similar process of expanding the request is to add synonyms and related terms. This is realized within LexiQuo by the same operation as the translation: synonyms and related terms are displayed for a controlled expansion of the request. The „more options“ window with German as source language, English as target language for „Magersucht“:

By default, this window opens as soon as the linguistic results from a request are coming in. By clicking on one of the additional terms, this term is sent directly to the search engine (terms from the source language) or it is added to the original request (with an OR or with an AND operator), and a new search starts immediately. This window allows a quick overview of possible request combinations and result sets. By default the request terms are translated into English or (if English is the source language) into German.
Snapshot of a German search with „Wintereinbrüche“ (onsets of winter), Synonyms, Word families (derivation) and translations into English:

The compound is split into its parts „Winter“ and „Einbruch“, and the synonyms, derivations and translations of the compound itself (if there are any) and of both parts are displayed. The parts are processed as isolated words, and therefore the additional terms can reflect ambiguities, like „Einbruch“ = burglary, which is not a correct translation in the „context“ of this compound.
LexiQuo covers the following languages: German, English, French, Italian and Spanish (source language). From these languages LexiQuo can translate into English and from German to French and Italian and vice versa.
Yahoo, MSN, Google, Exalead or Wikipedia can be accessed.
The source language setting is used to switch to the appropriate version of the search engine, i.e. with source language Italian, the request will be sent to yahoo.it or google.it or wikipedia.it.
More functions
Besides the standard Web Search, there are more options: image search (if supported by the selected search engine), video search (if supported) and persons search.
Site search option with German „linguistische Suche“ in www.altsearchengines.com:

If the LexiQuo widget is used, you can also search in some newspapers – German (Spiegel, FAZ, Netzeitung), French (Figaro, Monde, Libération), or English (Times, Washington Post, Time).
Snapshot of the Widget:
![]()
Background
The linguistic treatment is carried out by the linguistic engine EXTRAKT from TEXTEC Software (www.textec.de). The concept of the expansion of the request by generating all (inflected) forms of a word was first implemented within the European project VILIB (Virtual library, which was co-funded by the Commission of the European Union, cf. http://cordis.europa.eu/libraries/en/projects/vilib.html). The first steps towards a product were made in another international project, in which for the first time, a multi-lingual (cross-lingual) retrieval of full text has been implemented and tested („EMIR – European Multi-Lingual Information Retrieval“, cf.
http://cslu.cse.ogi.edu/HLTsurvey/ch8node7.html or
http://www.isoc.org/inet97/proceedings/A8/A8_1.HTM).
The EXTRAKT engine itself is based on huge dictionaries (with full forms, i.e. every inflected form is stored in the dictionary). The volume of the dictionaries varies from several hundreds of thousands of entries to more than 2,2 million entries. These dictionaries are compressed and loaded into RAM for high speed access.
Architecture
The LexiQuo system works as follows:
- From the user interface, with Javascript and AJAX, the request is sent via a Perl script to the EXTRAKT engine (in a server configuration).
The linguistic tasks are:
a) the analysis of the incoming term(s), including compound splitting, if the source language is German, (identification of the basic forms of the incoming terms)
b) the generation of all variants (of the basic forms from task a))
c) the access of synonym dictionaries
d) the access of derivation dictionaries (word families)
e) the translation of the basic forms of the request into another language.
- The „linguistic“ result is returned to the user´s PC.
- From the user’s PC, the enriched request is sent to the selected search engine.
- The search result is displayed on the user´s PC.
Future Extensions
Besides the continual updating of the dictionaries, we want to work on the following issues:
- Thesaurus
We have licensed a multi-lingual thesaurus (EUROVOC) from the Commission of the European Union, and we will add the thesaurus to the mono-lingual search (for synonyms), as well as to the multi-lingual search (‚official‘ translation of specific terms used by the administration of the EU).
- Clustering
We plan to add a clustering engine in order to get a concise presentation of the results. We will probably take the XSEARCH Clustering from Weitkämper Technology (www.weitkamper.com) – the test results were much better than those of Clusty from Vivissimo.
LexiLib
We have developed a similar portal (Lexilib), which is based on the same principles as LexiQuo. Instead of searching via search engines, Lexilib searches within national libraries (German National Library, British National Library, Library of Congress). In the case of library search, the search is carried out in very short pieces of text (title, sub-title and key words or descriptors) and the absence of a single word variant can more easily result in a zero hit situation.
![]()

















January 10th, 2009 at 8:48 am
Dr. Stegentritt: thank you very much for this article. The level of detail in explaining how LexiQuo works is both informative and rare. While most search engine developers speak vaguely about buzzwords like semantics and algorithms, you have cut to the chase and explained your approach in terms which the average person can comprehend. I like that, and I hope more people will follow your example. LexiQuo sounds like it will be particularly useful to students of language. Good luck with the project!
January 24th, 2009 at 1:17 pm
Thank you for your positive feedback. And thanks to all who came from altsearchengines for a test!
As I mentionned in my article, we have now added the thesaurus EUROVOC (it is a thesaurus in about 23 languages from the Publication Office of the European Commission) in German and English. And there is now a direct link from LexiQuo to the XCLUSTERING of Weitkamper Technology.
E