Yahoo! Research introduces Ideological Search

April 1st, 2009 by Charles S. Knight
Posted in Unique Interfaces | 3 Comments »

a

Scientists at Yahoo! today released Ideological Search, allowing users to control the ideology of their search results for the first time in search technology history. Until now, many Web search users were offended by the facts, pages, articles, and blogs in their search results that contradicted their own personal beliefs and values. Furthermore, search engines were often accused of being biased in one direction or another. Rather than try to comply with a hard-to-define “search fairness doctrine”, Yahoo! Ideological Search will allow its users to personally control the ideological perspective of their search. Users can now search with confidence, knowing that their search results will perfectly match their ideology, and no results will offend them. “This is a gigantic step in the personalization of search and we are excited to bring this key technology to market ahead of our competitors, ” said Dr. Prabhakar Raghavan, Chief Strategist of Yahoo! Search and head of the Yahoo! Labs organization that pioneered the technology.

1The team applied the latest research from the fields of sentiment analysis, intent detection, eye tracking, clustering, and empathetic reasoning to create Ideological Search. Current research directions stop at simple social search, faceted search, safe search, contextual search, while Ideological Search goes much further to give users only the information they want.

Every person in Yahoo Research! was involved in developing Ideological Search. For example, Pig, the large scale data processing environment had to be specifically adapted to Ideological Search. “Pig was used to build the index for Ideological Search; however, not all ideologies were compatible with Pig, especially those against the exploitation of animals, ” said Benjamin Reed from the Pig project. “We created a sister project, called Tofu, with the same Pig codebase to process some ideologies. Tofu was less optimized, but more flexible and gelatinous.”

The motivation for Ideological Search originated largely from the Microeconomics group. “It is not just deep science, it is also a strategic move to accelerate the fractionalization of the search marketplace which can only benefit us,” said Preston McAfee, head of the group.

21

According to Andrei Broder from Computational Advertising, “Our customer is king! We should believe what they believe.” This made the advertising infrastructure much more complicated. “Switching between ideologies millions of times per second was extremely challenging, ” he said. “Although it is difficult to keep up, the accuracy of our targeted advertising has greatly increased.”

Dr. Raghavan explained that dedicated members of each core ideology were brought into the lab through a judicious and ideologically appropriate choice of financial incentives and appeals to their strong-held convictions. The subjects were injected with a mildly radioactive dye, which crossed the blood-brain barrier, allowing Yahoo! scientists to monitor emotional responses in the mid cortex in response to results present in search results. Based on this data, the search team trained IMLR (Ideological Machine Learned Ranking) functions on a per-ideology basis were then applied as appropriate to incoming searches.

The Search Technologies group took a different approach to detecting ideology. They found that when they took on the belief systems of Yahoo! users with more extreme ideologies to process their searches, they could hack into election offices and gather voting records. The group indexed and clustered these records using their proprietary Ideological Detection Indexing Of Terms (IDIOT) technology. With IDIOT, users can enter their name, birthday, and last voting location and to have their ideology determined. According to Pras Sarkar, Lead Engineer on the project, “Building a smarter IDIOT technology was no easy feat. We created a new framework called Clorocks – an asynchronous platform for harvesting vital ideological information through users’ social networks, inbox reading habits, TIVO programming, etc. Clorocks is built on previous Yahoo! Research advancements like Getoutameway (queue processing) and BigBird (high performance database system).”

3

One side benefit of the ideology detection is that users no longer need to wonder about their world view. Now that the ideology of user can be identified, the team will be developing Yahoo! Me, a new service will tell users what they should see, where they should work and live, what they should buy, who they should date, and how they should vote. The anticipated outcome is that the service will allow users to spend less time worrying about these stressful topics and more time patronizing advertisers. Tired of being inundated with the contradictory and offensive beliefs of others?

Try Ideological Search and reinforce the values you embrace to make your life happy, peaceful and non-contentious.

Source: Yahoo! Research

The Leman Report – Web 2.0 Expo Day One

April 1st, 2009 by Hope Leman
Posted in News | 1 Comment »

expo1It is about 1:46 a.m. as I start this post on my attendance at 2.0 Expo San Francisco. I have given a quick look at the coverage elsewhere of it, which seems to note that it is indeed smaller and more subdued compared to past gatherings in keeping with the flaccidness of the overall economy. Even refreshments were dispensed grudgingly and in niggardly amounts. Welcome to the age of diminished expectations. It probably was a mistake on the part of O’Reilly Media to tar the conference with the rather downbeat slogan of, The Power of Less — which ranks as one of the most uninspiring call to arms ever issued.

leman01Nevertheless, I am delighted to be here and only wish that there were a much more robust presence of the healthcare and library sectors, Science 2.0 and the education community. This is a space we need to make our presence felt in and a learning community in which we need to participate. This means you medical librarians, librarians of all types, Science 2.0 bloggers, healthcare administrators and healthcare journalists and marketers, those in healthcare IT and informatics, biotech and big pharma and the world of foundations, small nonprofits, government at all levels, university research administration and any and all teachers at any level from elementary to graduate schools. It is just such a shame that we are all laboring under every dwindling allotments for business travel. This is really a conference that should be attended if you can possibly swing it.

I have been impressed by the quality and newsworthiness of what I have seen some far and kept thinking of the many professions that would benefit from attending this conference. Basically, anyone who has anything to do with even the most rudimentary of Web sites–which these days encompasses small businesses down to the florist shop on the corner. And certainly those in the search industry should be here given how much info is being given to attendees about rendering Web sites maximally searchable.

I can barely write coherently because I am so eager to get this story posted. I rarely use words like “awesome” and “mind-blowing”—but the session I attended yesterday, Watching Websites: A Report from the Frontlines of Web Monitoring was fascinating and sobering. If you are in any sort of business, the forthcoming book Watching Websites will be must reading, I suspect. How weird then (considering that we are dealing with a media company here, O’Reilly no less) that that you have to drill down among several different sites to find hard and fast info on it. Start here, anyway: http://www.web2expo.com/webexsf2009/public/schedule/detail/5693

The presenters of the workshop made a cogent, compelling argument for the sophisticated use of Web analytics for anyone who runs a Web site—which is a huge number of people nowadays not just techies. Presenter Alistair Croll discoursed fascinatingly on the use of the Google Analytics URL Builder
http://www.google.com/support/googleanalytics/bin/answer.py?hl=en&answer=55578
for stealth marketing, reputation tracking and site monitoring. Frankly, much of what he said was over my head. But I got, “URLs are the new cookies” and that the founders of companies like TinyURL are going to retire rich after their firms are acquired.

These two guys, Croll and Sean Power, struck me as brilliant and incredibly knowledgeable. Let us hope that their forthcoming book gets marketed better than has been the case up to this point. Like, say, how about something as basic as a flyer, guys, and stacks of business cards at the ready for attendees? To do them justice, though, they said that that so much has been happening in this space that they had been revising their presentation up to the very last moment. Watch for this book. More later—this was just the first half of the first day.

One comment—the security presence was quite oppressive. Maybe that is just life in big cities these days. I am from a small town in Oregon so perhaps I live in blissful ignorance of the need for hyper-vigilance in major metros. But should I really be waved off from approaching a customer service man by a beefy security guard and not allowed to go to the media room though I had only just been issued my press pass and told to go to the media room? This wasn’t Fort Knox—this was a Web 2.0 conference. Web 2.0 is totally about openness.

Still, everyone in healthcare should add the Web 2.0 Expo to the list of conferences they should attend. Indeed, I would say that there was much less hype at this conference and more of value to serious businesspeople in healthcare at Web 2.0 than at the Health 2.0 conferences: http://health2con.com valuable though those too are. I would argue that attendance at both for everyone in healthcare and science from the solo nurse practitioner to the Open Science lab scientist at a well endowed lab is well worth the hefty price of admission.

The World of Alternative Search Engines

April 1st, 2009 by Guest Author
Posted in Alts, Guest Authors, News | 1 Comment »

logoI’ve recently spent one day on a deep immersion into the world of alternative search engines, at the AltSearchEngines Day II conference in San Francisco. It was a low key, boutique conference, fully and easily contained by just one medium-sized conference room at the downtown InterContinental. According to Charles Knight, the engine behind the event, AltSearchEngines aims to represent everything search except Google. A year before it apparently meant “without any of the major search engines.” Not this year, Yahoo!’s BOSS was one of the highlights of the conference (and Powerset was the lead sponsor!).

BOSS stands for Build your Own Search Service and is the embodiment of Yahoo!’s initiative of opening up their search index. At the time of the talk it looked like I was the only one in the room who didn’t know what Yahoo! BOSS was, considering the number of hands that rose when Bill Michaels, the General Manager and the Director of the Open Search Platform at Yahoo!, asked the audience if they’re familiar with, or already using, BOSS.

Yahoo! decided to open up access to their index and expose it over a RESTful API, apparently to encourage companies to innovate in top of it by adding their “special sauces” (this is how their web site describes it, not me). According to Bill, such “sauces” are social graphs, semantic technologies, third party structured data, all layered in top of a 50 billion document index. There’s no point in re-crawling the web, he said. Yahoo! will do it for you and offer the index data as commodity, on a pay-as-you-go or flat fee basis. The pricing model’s details are being worked on as we speak. Tapping into this resource would save your company capital expenditures in the range of $300 million, which I assume is how much Yahoo! is spending on its crawling and indexing infrastructure.

BOSS gives you query handling, ranking, indexing and crawling, which is pretty convenient considering the amount of computing power and bandwidth required to achieve that if you’re to start on your own. All this and link counting is what made Google. Yahoo! exposes web index, image search, spelling, and result re-ranking APIs. They also came up with concept of “Vertical Lens” which are “highly customized and tuned vertical and niche search engines”. If you’re only interested in a subset of the index, you have the option of specifying a “white list” of URLs that would restrict the domain. Companies like TechCrunch, OneRiot and SurfCanyon are already using BOSS.

The way I see it, Yahoo! is trying to do with web data what Amazon EC2 did for computing cycles and storage. I was actually intrigued by the idea, so I floated it to Bill, and I don’t remember him saying it’s not true. Or maybe he did, it was a crowded and noisy room. Anyway, I suggested he could repeat this presentation, possibly with a little bit more technical edge to it in front of our gathering of Java geeks at SDForum JavaSIG. Check out the JavaSIG page of upcoming events or register to our mail list if you’re interested in the subject.

Then there was the Semantic Search panel, hosted by speakers from Digger, Truevert, TrueKnowledge, and PowerSst. Some random points: everyone on the panel and the audience seemed to agree that blind using of synonyms makes a mess, they’re more trouble than help; the attempt of fully understanding what is in one’s index is a very difficult proposition, not many companies are doing it, if any; and yes, semantic search is absolutely possible without a semantic web. Has anyone heard of “psyche”? Apparently a long running project whose goal is to capture all common knowledge of the world. Not easy to find it during first three minutes of search, so maybe I didn’t get the name right.

TrueKnowledge creates structured knowledge technology, enabling you to find the answer to a question directly, instead of being presented with a list of web pages where you may (or may not) find the answer. The knowledge base they work with is built through automatic mining of the web, databases, or it is manually entered by contributors. If you go to their site you can’t miss the “Add Knowledge” tab, which allows users to add entities and facts. To my delight, I found out that I may be an “agreement-making entity” (which, it’s true, TrueKnowledge currently doesn’t know anything about). They have an API for integrating their technology into third party systems.

OrcaTec is the technology provider behind Truevert, and they take a different approach on semantic search: they don’t use structured information, taxonomies, ontologies or thesauri, but derive meaning directly from documents. Apparently, they do so by expanding the query on the basis of a language model, learn the meaning of words from the context and use those words for search. When it comes to ranking, they use proprietary language modeling techniques and statistical linguistics. They claim they can build a vertical in about an hour. This sounds great, indeed.

Digger is another provider of semantic search technology; unfortunately their process of obtaining a beta invitation is not as streamlined as TrueKnowledge’s so I have to rely exclusively on what I remember from their presentation, which is mainly a control panel that guides the user into refining the query, by asking her to validate the terms of the query and further define the topic she is researching. My impression was they gather possible synonyms and conceptually close words and ask the user to validate them.

Other more or less new and cool stuff: mobile search (taptu), visual search that is actually not based on keywords, but comparing the images with visual examples (Imprezzeo, GazoPa, Viewzi, Cooliris), real time search (Collecta, OneRiot – the OneRiot presenter candidly admitted they don’t have any revenue model, they’re just burning money in building a real cool product), federated search (DeepWebTecnologies / Mednar), medical verticals (SearchMedica, RightHealth, and Yottalook).

Posted by Ovidiu Feodorov here.

Job Search Engine Simply Hired Goes Global

April 1st, 2009 by Charles S. Knight
Posted in Global, Job Search, Verticals | No Comments »

aSimply Hired just announced the launch of three localized job search websites for Germany, Spain and France .  The three new localized job search engines join current country-specific Simply Hired destination websites in the U.S., Australia, Canada, India and the U.K.

Launching Simply Hired websites for Germany, Spain and France has been an integral part of Simply Hired’s efforts to provide simple and effective job search tools to users around the world. All Simply Hired websites allow job seekers to browse jobs from specific occupational categories or to filter their results by location, job type, education and experience. Without requiring job seekers to become members, Simply Hired immediately shows job seekers which new listings have been added since their last visit and offers an advanced search option to return job listings that match specific criteria.

“Despite unprecedented unemployment rates, Simply Hired has millions of open jobs in our U.S. database in need of the perfect people to fill them,” said Gautam Godhwani, co-founder and CEO, Simply Hired. “By expanding internationally and partnering with social networking organizations such as LinkedIn and Plaxo, we are confident that we are providing job seekers with every possible resource to find their dream job.” So what are you waiting for? Try Simply Hired and leave a comment!

Job Search Made Simple

Looking for a job shouldn’t be a full-time job! That’s why we built the biggest, smartest job search engine on the web. We search thousands of job sites and companies, just so you don’t have to. We eat, sleep and breathe job search, to help you find that dream job. Use our nifty tools to find local jobs, identify trends, research salaries, and secure that offer letter.

Jobsuche leicht gemacht

bDie Jobsuche sollte nicht zum Vollzeitjob werden. Daher haben wir eine der größten und intelligentesten Jobsuchmaschinen im Internet entwickelt. Wir durchsuchen für Sie Tausende von Jobsites und Firmenwebsites. Worauf warten Sie noch?

Buscar empleo de forma sencilla

b¡Buscar un trabajo no debería ser un trabajo de jornada completa! Por este motivo hemos creado el buscador de empleo más grande e inteligente en Internet. Buscamos en miles de sitios de trabajo y empresas, para que tú no tengas que hacerlo.  ¿A qué estás esperando?

Recherche d’emploi simplifiée

dLa recherche d’un emploi ne devrait pas être un travail à temps plein! Voilà pourquoi nous avons créé le moteur de recherche d’emploi le plus intelligent et le plus complet du marché. Nous prospectons des milliers d’entreprises et de sites d’offres d’emploi pour faciliter vos recherches.

Webfluence! The next generation in searching stuff!?!

April 1st, 2009 by Charles S. Knight
Posted in Newcomers | No Comments »

could_city
Webfluence
is the first search engine that enables any user to add any website to any search result. If you don’t see your website in a given search result, simply click “Add result” and provide us with information about your website. Your addition will instantly be added to our index and will propagate throughout our system so that your addition can be viewed globally. This allows people in all parts of the world to receive the benefits of your contributions. Source: Webfluence