Over the last few years the World Wide Web has become a digital Gutenberg which has unleashed a completely new business and information sharing scenario. Publishers of all types of content have chosen the Web as repository for content previously found in papers or private archives. The Web has even become a medium of publication of native content such as blogs, forums and twitters. Therefore, we can only expect an exponential growth of publisher and user-generated content.
In order to get hold of the explosion of content, searching technologies continue to be the only tool available to individual users. Search itself can be construed as an implementation of dynamic and limitless hyperlinking since every time we do a search we are linking different documents according to the keywords in the search query. And for the time being search remains to be the only technology that can make the web manageable for end users, particularly as a self-service which is simple and intuitive for the average person.
However, search is an old technology which dates back to the sixties and it was not designed to solve the challenge of an increasing number of users and growing complexity in an also increasing number of documents. In fact, for end users search has shifted from being a service provided by librarians to a self-service similar to ATMs. This change generates frustration for users and puts pressure on search engine providers to improve performance and user-friendliness. As a result, the Web community realizes that most of the potential of Web and the knowledge it contains are underexploited or are even unknown.
And here is where Semantics comes to the rescue: the Web community is looking at Semantics as the source of solutions for exploiting all the potential of the Web since Semantics is the science of meaning, and it is the meaning of Web texts the challenge to be addressed. The so-called Semantic Web is the tag under which various research efforts are merging, such as knowledge representation, automatic reasoning, etc. But so far results are falling short of expectations because implementing Semantic Web principles at web level becomes an impossible task even if the task could be handled in an automated fashion, and this becomes a stumbling block to creating semantic knowledge.
That is why Natural Language Processing (NLP) is the solution to automate the knowledge acquisition problem because current NLP technologies provide one of the key ingredients for the Semantic Web to become a reality: text analytics or the ability to extract content from text. This ability can be turned into two highly needed tasks: automatic text tagging of entities, concepts and events; and automatic population of ontologies with selected entities, concepts and facts. In addition, NLP technologies can also provide interfaces capable of natural language understanding which are required by self-service end users.
Since 2007 Bitext is applying this approach to real-life projects in areas such as citizen services and business intelligence.
A. Valderrábanos
Original post here


















At Kosmix we strive to provide rich snippets for each result we surface and now all the big search engines – Yahoo! with SearchMonkey; Google with rich snippets; and Bing with Smart Captions – have started to do the same. The benefit is that the user has more information before clicking which increases the quality of traffic that a publisher gets. Nick Cox from Yahoo! reported up to 15 percent greater click-through-rate because of richer presentation of results using semantic techniques.
Standard internet search engines do little more than trawl documents for key words and by and large there is no intelligence involved. A Dresden-based spin-off of the EU-funded Sealife project is helping search engines understand what they are actually looking for, which is a huge benefit to companies and researchers in terms of helping them to organize their vast knowledge resources.
The free public web site, a semantic browser for the life sciences community, demonstrates what a semantic browser is all about. It is based on the standard database PubMed, provided by the US National Library of Medicine. PubMed is widely used among biomedical researchers. But it is far from perfect: “PubMed returns some 50,000 articles if you enter ‘heart diseases’. In reality, though, there are more than 800,000 articles on this topic. Most of them do not use ‘heart diseases’ as a key word, so the standard PubMed search engine won’t find them,” Alvers explains.


Imagine what the Internet would be like if computers could read your mind. 
