Language and Semantics: What can you do for my search engine (and for me)?

January 31st, 2010 by Guest Author
Posted in Guest Authors, Semantic | 3 Comments »

Over the last few years the World Wide Web has become a digital Gutenberg which has unleashed a completely new business and information sharing scenario. Publishers of all types of content have chosen the Web as repository for content previously found in papers or private archives. The Web has even become a medium of publication of native content such as blogs, forums and twitters. Therefore, we can only expect an exponential growth of publisher and user-generated content.

In order to get hold of the explosion of content, searching technologies continue to be the only tool available to individual users. Search itself can be construed as an implementation of dynamic and limitless hyperlinking since every time we do a search we are linking different documents according to the keywords in the search query. And for the time being search remains to be the only technology that can make the web manageable for end users, particularly as a self-service which is simple and intuitive for the average person.

However, search is an old technology which dates back to the sixties and it was not designed to solve the challenge of an increasing number of users and growing complexity in an also increasing number of documents. In fact, for end users search has shifted from being a service provided by librarians to a self-service similar to ATMs. This change generates frustration for users and puts pressure on search engine providers to improve performance and user-friendliness. As a result, the Web community realizes that most of the potential of Web and the knowledge it contains are underexploited or are even unknown.

And here is where Semantics comes to the rescue: the Web community is looking at Semantics as the source of solutions for exploiting all the potential of the Web since Semantics is the science of meaning, and it is the meaning of Web texts the challenge to be addressed. The so-called Semantic Web is the tag under which various research efforts are merging, such as knowledge representation, automatic reasoning, etc. But so far results are falling short of expectations because implementing Semantic Web principles at web level becomes an impossible task even if the task could be handled in an automated fashion, and this becomes a stumbling block to creating semantic knowledge.

That is why Natural Language Processing (NLP) is the solution to automate the knowledge acquisition problem because current NLP technologies provide one of the key ingredients for the Semantic Web to become a reality: text analytics or the ability to extract content from text. This ability can be turned into two highly needed tasks: automatic text tagging of entities, concepts and events; and automatic population of ontologies with selected entities, concepts and facts. In addition, NLP technologies can also provide interfaces capable of natural language understanding which are required by self-service end users.

Since 2007 Bitext is applying this approach to real-life projects in areas such as citizen services and business intelligence.

A. Valderrábanos

Original post here

Web 3.0 and Semantic Search

January 31st, 2010 by Charles S. Knight
Posted in Guest Authors, Semantic | No Comments »

blog_logo

By Abhishek Gattani

I recently attended the Web 3.0 conference held in Santa Clara (January 26-27). During my attendance at the conference I had the chance to listen from Google’s Johanna Wright (Director of Product Management in Search) and Microsoft’s Scott Prevost (Principal Development Manager at Bing) about how they are using semantic technologies to drive innovation in search.

The conference focused on the Semantic Web which is something that we at Kosmix have been innovating in for the last three years. Our goal has been simple: to provide consumers with the best experience for exploring a topic and following topics that they care about.

Here is my take on the evolution of search and the semantic web…

If web 1.0 was about linking web pages, web 2.0 about linking people, and then web 3.0 is about linking data. Tim Burner Lee, father of the Web, has made it ample clear that linking data is where the future of the Web is. Semantic web is about annotating facets and attributes associated with web content and linking data. In other words, semantic web is about teaching machines to read web pages, which are designed to be read by humans. So how can semantics improve search?

Search so far has been about finding the best web pages for a given query. However, the purpose for searching is to complete a task. Say you want to find the lowest price for a camera, pick a romantic restaurant, or research the effect of pollution as a function of GDP. The information to complete such tasks resides in different web pages and therefore, it is no longer possible to find one page that will complete the task. For instance, the pollution levels by country and country GDP are in separate places on the Web. However, by using semantics understanding, search engines can connect these web pages and fuse the two datasets to complete the desired task.

Another instructive example is figuring out what to cook. If search engines understood the structure of recipes then one could narrow down their search to recipes based on course, ingredients, occasion, cuisine, convenience, and user ratings. A variant of the idea is semantic snippets where search snippets present the structure behind the linked page. For instance, for events we list the date, time, event snippet, and even ticket prices, which really let you decide if you should be clicking to book a ticket or it is not aligning with your schedule and budget.

image003At Kosmix we strive to provide rich snippets for each result we surface and now all the big search engines – Yahoo! with SearchMonkey; Google with rich snippets; and Bing with Smart Captions – have started to do the same. The benefit is that the user has more information before clicking which increases the quality of traffic that a publisher gets. Nick Cox from Yahoo! reported up to 15 percent greater click-through-rate because of richer presentation of results using semantic techniques.

Semantic techniques can also be used to rank web pages. Today, the rankings are largely a function of keyword matches and the popularity of the page. However, if we searched for “drop in currency value”, what we really mean is “inflation”. If search engines understood the meaning of documents and used it in ranking, then higher quality documents about “inflation” would surface, which need not even contain the search terms!

As you can see Semantic techniques have already made inroads into search and have started benefiting users. However, there is still a long way to go before the promise is fully realized. After all, the Web’s content was designed for our consumption and machines need much of our help in understanding it.

GoPubMed turns a dull search engine into a brainiac

December 22nd, 2009 by Charles S. Knight
Posted in Semantic, Updates | 1 Comment »

gopubmedLogoStandard internet search engines do little more than trawl documents for key words and by and large there is no intelligence involved. A Dresden-based spin-off of the EU-funded Sealife project is helping search engines understand what they are actually looking for, which is a huge benefit to companies and researchers in terms of helping them to organize their vast knowledge resources.

iconSearch“If you google ‘IT investment’ and ‘Germany’, you will find a wealth of articles. But what you won’t find is a specific article on a new investment in Dresden because Google does not know that Dresden is located in Germany,” says Michael R. Alvers, CEO and co-founder of Transinsight. Building on results of the “Sealife” project funded under the umbrella of the EU’s 6th Research Framework Program, the company has developed an intelligent, “semantic” browser that is now being marketed to companies and which is also freely available on the internet for certain applications.

Mandy01The free public web site, a semantic browser for the life sciences community, demonstrates what a semantic browser is all about. It is based on the standard database PubMed, provided by the US National Library of Medicine. PubMed is widely used among biomedical researchers. But it is far from perfect: “PubMed returns some 50,000 articles if you enter ‘heart diseases’. In reality, though, there are more than 800,000 articles on this topic. Most of them do not use ‘heart diseases’ as a key word, so the standard PubMed search engine won’t find them,” Alvers explains.

What GoPubMed does to considerably expand the query is to add what Alvers calls “ontologies”. An ontology is a kind of dictionary or a vocabulary. GoPubMed uses the Medical Subject Headings (MeSH), an international medical vocabulary, and the  gene ontology (GO) to better “understand” which basic research articles are related, for example, to heart disease.

But having to choose from more than 800,000 articles instead of only 50,000 is not necessarily a big leap forward for the user. This is why GoPubMed goes a step further by narrowing the choice of articles via a tree-like user interface based on the same ontologies used to expand the query. Alvers: “After few clicks, the user arrives at a choice of articles that is both far more comprehensive and far more precise than what would be offered by a standard PubMed search.”

iconEditor2The life sciences community is realizing just how useful GoPubMed can be, says Alvers: “The number of users has increased rapidly during the last couple of months. We now have around 20,000 unique visitors a day.” The combination of an extended database query with an intuitive interface also convinced the jury of the prestigious red dot communication design award: GoPubMed beat more than 6,000 competitors to this year’s award.

“In an age of ever growing online resources, there is undoubtedly a need for semantic browsing,” says Joel Bacquet, project officer for the Sealife project from the European Commission. “The SeaLife project convinced us because it combined an elaborate technical approach to a common problem in many industries and research branches, the problem of information overload, with a comprehensible business plan.”

Source: HealthTech Wire

TipTop reveals the best and most popular gifts of 2009!

December 20th, 2009 by Charles S. Knight
Posted in Federated Search, News, Realtime, Semantic | 1 Comment »

search_blk_bg_holi

TipTop, a real-time, semantic, social, search solutions provider, has brought together the best of real-time search, social media and comparison shopping for this gift giving season with its 2009 Holiday Search Special.

Following on the heels of TipTop Shopping’s successful launch, the unique Gift Guide site at http://ftt.nu/gifts presents the Top 50 Gifts for 2009. TipTop’s semantic engine analyzed tens of millions of tweets to determine the top gift items based upon frequency, relevancy to holiday gift-giving, text sentiment scores, and the quality of the messages themselves.

Some top gifts by category this year are:
# Animal: Dog, ZhuZhu, Cat, Horse
# Unusual: Tattoo, New Baby, AK-47, New Brain
# Boring: Money, Gift Card, Certificate, Furniture
# Electronics: HDTV, Laptop, Macbook, GPS, Kindle, Flipcam
# Smartphones: iPod Touch, iPhone, Backberry, 3GS, Droid
# Jewelry: Watch, Ring, Diamonds, Necklace, Earrings
# On-The-Go: Car, Ticket, Bike
# Toys & Games: Guitar Hero, Legos, Wii, PS3, Xbox 360, Ball
# Clothing: Snuggie, Shoes, Dress, Boots, Shirt, Hat, Coat, Socks

For each gift item, TipTop also computed an aggregate sentiment by identifying positive Tips, negative piTs, and neutral messages.

For example, the piT sentiment around gift items such as “sweater”, “boots” and “droid” is hovering around 20%. On the other hand, the sentiment in TipTop Search at http://FeelTipTop.com around gift items such as “Dress”, “GPS” and “CD” is 45% positive.

2009-12-20_1438

Analysis of tweets also revealed what all folks plan to do and where all folks plan to travel over the next few weeks. Some of the top activities people are most excited about this year include Shopping (33% Tips & 13% piTs), Sleep (46% Tips &% 7 piTs) and Disneyland (52% Tips & 95 piTs). Places where people do not have that many good things to say about spending their holidays include My House, Your House, and My sisters. My Parents is the top place to stay, sentimentally speaking at 44% Tips and 14% piTs. .

According to TipTop’s analysis of tweets, places people are most looking forward to spending time in are Phoenix, Paris, Miami and Australia. Lots of people are going to New York, Las Vegas, and England but they don’t seem to be tweeting as joyfully.

For more information about TipTop’s 2009 Holiday Search Special and how results were determined, please visit TipTop’s blog at http://blog.FeelTipTop.com. If you are interested in researching, comparing product reviews and ratings, or purchasing any of the thousands of gift ideas, please check out TipTop Shopping at http://ftt.nu/shopping.

Source: TipTop Technologies

What about this, Shyam?

hk47

New in Primal Labs, a website generator prototype

December 9th, 2009 by Charles S. Knight
Posted in Semantic, Updates | No Comments »

pfImagine what the Internet would be like if computers could read your mind. Primal Fusion is building that future today with a ground-breaking approach called thought networking. Through the use of “software assistants” that act upon your thoughts, Primal Fusion is developing services that will make it faster, easier and more convenient to get stuff done on the Web such as writing papers, investigating purchases, tracking topics, collecting information, and making social connections. Rather than sifting through page after page of content on your own, our software assistants will automate many of the time-consuming activities done online to provide you with a better, more productive Web experience.

Primal Fusion is also leveraging thought networking to meet the needs of the producers of content, such as publishers, retailers, bloggers, and service organizations, by giving them tools so visitors can create content dynamically to meet their needs at the moment they need it. This greatly reduces the costs of creating and organizing content, while giving customers a more personal experience based on their unique thoughts and intentions.

Thought networking brings the power of thinking online, and helps everyone get more from the Web.

Driving these innovations around thought networking is Primal Fusion’s semantic technology platform. Since 2005, our team has developed this platform, drawing upon expertise in semantic technology, knowledge representation, statistical computing, information extraction, facet analysis, adaptive classification, and interaction design.

Guiding Primal Fusion is a strong management team with many years of experience in the high-technology business, from successful start-ups to large, fast-growing companies. Together, we have a great track record of building successful high-tech companies — and we’re excited to be doing it again!

To keep up with what’s new, visit our Ideas blog or our Products blog. Or request an invitation to our private alpha, and try thought networking for yourself!

When you do research on the Web, it’s common to collect information from many different sources. You visit a variety of websites, glean what you can from them, and put together some sort of summary of your research.

Today, we’re pleased introduce a prototype that automates some of that otherwise manual activity. It searches the Web on your behalf, gathering and organizing information about a subject you’re researching. The end result is a dynamically-generated website — complete with text, images, and links to more information.

Using the website generator is simple: type a few words to describe a subject, then click Generate a website. The prototype builds a website about your subject using text and images from sources such as Wikipedia, Yahoo!, and Flickr. For example, here’s a website about 19th century civil wars:

19th-C-Civil-Wars-Small2

And here’s one about statues of Elvis:

Statues-of-Elvis-Small2

Try it yourself:

To use the website generator prototype:

1. Make sure you’re signed in to Primal Fusion. If you don’t yet have a Primal Fusion account, you can request an invitation to our private alpha.
2. Launch the website generator in Primal Labs.
3. Type a topic of interest and click Generate a website.
4. Click on your website’s navigation links to view related pages.
5. To build another website, just type a different topic in the text box.

By Robert Barlow-Busch here