“Charles has left the building…”

February 4th, 2010 by Charles S. Knight
Posted in Uncategorized | 2 Comments »

rww

Here is the post that started it all The Top 100 Alternative Search Engines for January 2007

ase

And here is the blog that published “The Most Wonderful Search Engines You’ve Never Seen”

the-future

Here is where I will be writing about All Things Search starting on February 1st, 2010.

A plug for Science Commons Symposium – Pacific Northwest

January 31st, 2010 by Hope Leman
Posted in Guest Authors | 2 Comments »

Okay, this post is going to be a shameless plug for a conference I am helping to organize that I think search industry folks should come to: Science Commons Symposium – Pacific Northwest.

So brace yourselves: objectivity is not going to come into play in this post. Forget about that for the moment. The key thing is to get search people into the world of online science, e-science, Big Data, Open Science, and Science 2.0 way more they have been. Opportunity knocks, group.

Why do I care? Because I have loved people who have been seriously ill and I want the scientific process to become more efficient so that medical research can advance and search is a key part of that. I also don’t object to anyone in the search industry making some money and I think that that can happen, too.

Let’s consider what you would learn at Science Commons Symposium – Pacific Northwest.

First of all, Microsoft Research is generously hosting it (and Science Commons is the organizer). This is a good chance for those of us interested in actually seeing what the Microsoft campus looks like to do so and to get an idea of what Microsoft is doing in the realm of tools for higher education and scientific research. There are important things going on.

For instance, one of the slated attendees of the symposium is Lee Dirks, Director of Education & Scholarly Communications in Microsoft’s Research division. And given the troves of scholarship being produced that will have to be searched and rendered searchable it behooves those in the search industry to follow these things.

Another of the featured speakers will be the chemist and Open Notebook Science advocate, Jean-Claude Bradley. If you need a quick grasp of the mountains of data that are going to have to be rendered searchable, check out Bradley’s slideshow “Leveraging Transparency and Crowdsourcing in Chemistry Using Open Notebook Science.”

Don’t be put off by the fact that the presentation is specific to chemistry. Just click through the slides to get a quick grasp of the vast amounts of data and material that not only chemistry but many fields are producing in a wide range of formats and uploading to a plethora of locations on the Internet (wikis, Second Life, Google Docs, scientific social networking sites and specialized search engines like ChemSpider).

And speaking of ChemSpider, one of its creators, Antony Williams, is also to be one of the speakers at the symposium. Williams is a man to know in search because he is an exemplar of someone who bridges the world of Open Science search tools, academia, prestigious scientific societies (in his case the Royal Society of Chemistry) and industry. See my interview with Williams here.

He is a good guy (and a brilliant one) to know if you want to get a grasp on such key matters as how to obtain scientific information on mobile devices—or, in the case of the search industry, how to cater lucratively to those that need to.

As another of the speakers at the Symposium, Cameron Neylon, has said, “Search is crucial for the success of Open Science. As we put more stuff online it has to be possible to find it. This means developments in semantic search, as well as improved computational engines like Wolfram Alpha.”

See my interview with Neylon here.

And what about the matter of Big Data, say in biology and the life sciences? People in search need to gird themselves for the coming avalanche of Big Data. That is where you could benefit by hearing what another of the symposium speakers, Stephen Friend, President and CEO of Sage Bionetworks, will say. Search people need to get a handle on these things. Sage is working with massive amounts of data.

And speaking of enormous amounts of data, the Obama administration is trying to come up with policies on the best way to provide maximum public access to the published results of taxpayer-funded research. The search industry does not seem to have been following this important public policy debate even though it will directly affect its interests. After all, the more good stuff on the Web, the more central the role of search. I strongly advise those interested in search to peruse the archived pages of the policy forum on these matters hosted by the White House’s Office of Science and Technology Policy.

Much of discussion in that forum related to such search-related topics as institutional repositories, PDF versus XML, and what information will be released. It was really fascinating to see the divergent views of publishers and Open Access advocates and to read about such things as the peer-review process (and that alone should interest those in search because more and more of it is moving online and will need to be searched).

Search people really must possess a basic understanding of the Open Access movement, given that public opinion is trending towards greater public access to tax-payer funded research, which means that the mainstream, monopolistic behemoths of sci/tech publishing (think Elsevier) are going to have to adapt to a more Open Access-ruled world. More Open Access, more stuff to search out. And, luckily for those planning to attend the symposium, one of the speakers will be Heather Joseph, Executive Director of SPARC, the Scholarly Publishing and Academic Resources Coalition who can explain these potentially momentous changes in the world of scientific communication.

And given that as more and more data is released and more and more scientific articles are written, how will the worth and influence of those articles be measured? That is what another of the speakers at the symposium, Peter Binfield, Managing Editor of the online journal PLoS ONE, will discuss.

I heard Mr. Binfield’s talk at ScienceOnline2010 (and that conference is an absolute must for those interested in scientific communication—which should include savvy people in search) on PloS’s innovative article metrics program.

Thus, there are clearly opportunities for startups in the realm of scientific communication. Just look at the popularity of the research management tool Mendeley, for example.

Okay, so we have seen that a huge amount of science-related material is being produced and that people like Antony Williams are coming up with tools to make sense of it all in certain disciplines like chemistry or Peter Binfield at the article level whatever the discipline. What about the bigger picture of online science and the implications for search, information science, libraries and librarians? That is where the keynoter of the symposium, John Wilbanks, comes in. He is not only a big picture thinker, he is also a doer like Antony Williams in that Wilbanks is working closely with Stephen Friend on the Sage project. These are people that search people need to get to know—and you can do that at the symposium. There is even a t-shirt design contest for those who need help with travel money and hotel expenses during the symposium.

There are some search firms who do get Open Science. For instance, Deep Web Technologies is a leader in designing search engines for online science. Take a look at WorldWideScience.org, for example.

And DeepDyve has an innovative agreement with the reference manager/citation manager CiteULike to enable CiteULike users to get previews of articles of interest to them for around ninety-nine cents in order to determine if they want to pay a considerably larger sum for access to the whole article. DeepDyve’s CEO William Park is one smart fellow—he sees the connections between researchers, established publishers, and tools that researchers use (like CiteULike) that seem to be entirely escaping his counterparts elsewhere in search.

Science Commons Symposium – Pacific Northwest is the place for those in search to get up to speed on key developments in scientific communication.

Thus endeth today’s lecture and sales pitch. -Hope

Language and Semantics: What can you do for my search engine (and for me)?

January 31st, 2010 by Guest Author
Posted in Guest Authors, Semantic | No Comments »

Over the last few years the World Wide Web has become a digital Gutenberg which has unleashed a completely new business and information sharing scenario. Publishers of all types of content have chosen the Web as repository for content previously found in papers or private archives. The Web has even become a medium of publication of native content such as blogs, forums and twitters. Therefore, we can only expect an exponential growth of publisher and user-generated content.

In order to get hold of the explosion of content, searching technologies continue to be the only tool available to individual users. Search itself can be construed as an implementation of dynamic and limitless hyperlinking since every time we do a search we are linking different documents according to the keywords in the search query. And for the time being search remains to be the only technology that can make the web manageable for end users, particularly as a self-service which is simple and intuitive for the average person.

However, search is an old technology which dates back to the sixties and it was not designed to solve the challenge of an increasing number of users and growing complexity in an also increasing number of documents. In fact, for end users search has shifted from being a service provided by librarians to a self-service similar to ATMs. This change generates frustration for users and puts pressure on search engine providers to improve performance and user-friendliness. As a result, the Web community realizes that most of the potential of Web and the knowledge it contains are underexploited or are even unknown.

And here is where Semantics comes to the rescue: the Web community is looking at Semantics as the source of solutions for exploiting all the potential of the Web since Semantics is the science of meaning, and it is the meaning of Web texts the challenge to be addressed. The so-called Semantic Web is the tag under which various research efforts are merging, such as knowledge representation, automatic reasoning, etc. But so far results are falling short of expectations because implementing Semantic Web principles at web level becomes an impossible task even if the task could be handled in an automated fashion, and this becomes a stumbling block to creating semantic knowledge.

That is why Natural Language Processing (NLP) is the solution to automate the knowledge acquisition problem because current NLP technologies provide one of the key ingredients for the Semantic Web to become a reality: text analytics or the ability to extract content from text. This ability can be turned into two highly needed tasks: automatic text tagging of entities, concepts and events; and automatic population of ontologies with selected entities, concepts and facts. In addition, NLP technologies can also provide interfaces capable of natural language understanding which are required by self-service end users.

Since 2007 Bitext is applying this approach to real-life projects in areas such as citizen services and business intelligence.

A. Valderrábanos

Original post here

Web 3.0 and Semantic Search

January 31st, 2010 by Charles S. Knight
Posted in Guest Authors, Semantic | No Comments »

blog_logo

By Abhishek Gattani

I recently attended the Web 3.0 conference held in Santa Clara (January 26-27). During my attendance at the conference I had the chance to listen from Google’s Johanna Wright (Director of Product Management in Search) and Microsoft’s Scott Prevost (Principal Development Manager at Bing) about how they are using semantic technologies to drive innovation in search.

The conference focused on the Semantic Web which is something that we at Kosmix have been innovating in for the last three years. Our goal has been simple: to provide consumers with the best experience for exploring a topic and following topics that they care about.

Here is my take on the evolution of search and the semantic web…

If web 1.0 was about linking web pages, web 2.0 about linking people, and then web 3.0 is about linking data. Tim Burner Lee, father of the Web, has made it ample clear that linking data is where the future of the Web is. Semantic web is about annotating facets and attributes associated with web content and linking data. In other words, semantic web is about teaching machines to read web pages, which are designed to be read by humans. So how can semantics improve search?

Search so far has been about finding the best web pages for a given query. However, the purpose for searching is to complete a task. Say you want to find the lowest price for a camera, pick a romantic restaurant, or research the effect of pollution as a function of GDP. The information to complete such tasks resides in different web pages and therefore, it is no longer possible to find one page that will complete the task. For instance, the pollution levels by country and country GDP are in separate places on the Web. However, by using semantics understanding, search engines can connect these web pages and fuse the two datasets to complete the desired task.

Another instructive example is figuring out what to cook. If search engines understood the structure of recipes then one could narrow down their search to recipes based on course, ingredients, occasion, cuisine, convenience, and user ratings. A variant of the idea is semantic snippets where search snippets present the structure behind the linked page. For instance, for events we list the date, time, event snippet, and even ticket prices, which really let you decide if you should be clicking to book a ticket or it is not aligning with your schedule and budget.

image003At Kosmix we strive to provide rich snippets for each result we surface and now all the big search engines – Yahoo! with SearchMonkey; Google with rich snippets; and Bing with Smart Captions – have started to do the same. The benefit is that the user has more information before clicking which increases the quality of traffic that a publisher gets. Nick Cox from Yahoo! reported up to 15 percent greater click-through-rate because of richer presentation of results using semantic techniques.

Semantic techniques can also be used to rank web pages. Today, the rankings are largely a function of keyword matches and the popularity of the page. However, if we searched for “drop in currency value”, what we really mean is “inflation”. If search engines understood the meaning of documents and used it in ranking, then higher quality documents about “inflation” would surface, which need not even contain the search terms!

As you can see Semantic techniques have already made inroads into search and have started benefiting users. However, there is still a long way to go before the promise is fully realized. After all, the Web’s content was designed for our consumption and machines need much of our help in understanding it.

Searchzooka – A long tour of features

January 30th, 2010 by Charles S. Knight
Posted in Uncategorized | No Comments »