Stephen E. Arnold, August 12, 2008
I’m on record for my assertion, “Enterprise search is dead.” Web search is now a monopoly resting firmly in the claws of Googzilla. Software developers are creating “free” search solutions and making them available under various open source schemes. You can download Lucene or FLAX and be searching in a couple of hours. Other vendors are “baking in” search, slapping a buzzy new name of the service, and shifting the customer’s attention toward higher perceived value services such as business intelligence, predictive analytics, and enterprise publishing systems.
Despite all this effort user dissatisfaction with search systems in organizations is in the 50 to 75 percent range, depending on which data set you wish to accept as true.
And poor old search? She’s damaged goods now.
The diagram below shows where we are. Most of my analyses attract little attention. I think the reason is that I conduct tedious research using documents that no person in his / her right mind would read. Example: Endeca’s patent documents. Now those are fascinating, but if you attend a meeting of search engine optimizers and bring up the subject of an Endeca tree, you will find yourself sitting alone in a corner.

Reuse permitted for libraries and academic institutions. Any other use requires the permission of Stephen E. Arnold. Write seaky2000 at yahoo dot com.
My take on the future of search is that no technology goes away. The deeply dissatisfying key word search technology that dates from the 1960s is still with us. Don’t believe me? Fire up Outlook Express and search for an email. You get the same parametric, one-thing-at-a-time, reverse chronological order that is available to you from Dialog Information Services.
You will note that the y axis represents computational demand. The more razzle dazzle you slap on a key word indexing system, the more storage, bandwidth, CPU cycles, and plumbing you will need. The way key word indexing can bring an older Pentium computer to it knees when it runs Google Desktop Search or another “free” desktop search system provides a real-life example of how search usurps resources. Even the simplest keyword indexing requires a large part of a computer’s resources when indexes are updated and rebuilt. The more users and the more content you process, the more plumbing you need. When you slap on additional content processing, you are in a poker game that you cannot win. The computational odds are stacked against you. The fix is to process less content or turn off features. Believe me, even the big guys do this. Google and Microsoft, for example, have priorities for their indexing. If a site is a low priority like the US government’s maritime administration, Google won’t be rushing to index the latest white paper on brown water navigation rule changes. A Dark Knight review on the New York Times’s Web site? That’s a must-have item.
The x axis is time. Search is getting on in years. As lousy as search is today, it has been consistently crappy. I don’t think much beyond 2010 since my fun filled heart “event” in February 2007. I stick to the 2010 time horizon, not much beyond that point. So, the overall take away from my graphic is that over time, search vendors have stacked up hot, new functions. We’re making progress, if you believe in technological progress. I don’t but that’s another essay topic for another day.
Let’s look at the stack. What my diagram attempts to convey is a layering of functions on top of basic key word indexing. Each of these layers persists. When a new, big thing comes along, the innovator slaps the functionality on top of key word search and previous content processing operations. A good example of this is Powerset’s natural language, semantic, “smart” system. Under the hood was older technology from Xerox’s Palo Alto Research Center. Powerset’s wizards added features, launched a killer public relations program, rolled out a demonstration on the Wikipedia content, and collected $100 million from Microsoft. Microsoft will probably add functions to Powerset’s system and then plop some or most of the Powerset functionality on Live.com, Microsoft’s Google killing Web search system.
Today’s search systems are like the big snow balls my friends and I made in Illinois during the winter. We started small and rolled the seed until we had a ball the size we needed. Search technology works a bit like this. The core is key word indexing, and then engineers roll more features, services, and functions. The ball gets ever bigger with the interesting consequences we observe today; for example: Users don’t want to search or don’t know how to search effectively and, therefore, want systems that are easy-to-use, exposing information. The hot search engines today figure out what a user wants and either [a] provide suggestions that are hot linked to information or [b] use math to generate a result that most [previous?] users found helpful.
Vendors are sticking search and advanced content processing systems out of sight. The systems display information when a user reaches a point in a work flow script or just push the needed information to the user automatically. The “automatic” can be set up (Connotate’s approach) or figured out from a user’s actions (Google’s approach on “iGoogle” or “individualized Google”.
Search is there, just subordinate to other, higher value functions.
Search is reworked into “outputs” that present answers. Each item in the report is hot linked to underlying data. The idea is that once a report is set up, the user can get the information pre-packaged and ready to use. This is the digital equivalent of microwaving a burrito. Google has an invention called “I’m feeling doubly lucky.” The idea is that no search by the user is executed. The system figures out what the user needs and pushes it to the user’s mobile device. Search without search is in our future.
The future of search is going to look pretty much like today’s search. The flashier or more advanced features are in labs or available from lesser-known vendors; for example, Coveo, Exalead, ISYS Search Software, MarkLogic, Silobreaker, and others. I have the names of more than 300 companies engaged in search and content processing in my files. More companies call themselves to my attention each day. The down economy has not inhibited start ups in the search and content processing sector.
Let me wrap up with some general observations about the future of search:
I don’t see any major improvement in search and content processing in the near term. What we have will make incremental gains, but I don’t see a “red shift” in this market sector like some optimistic investors.
Furthermore, quite a few vendors are under severe financial pressure. The alleged missteps with Fast Search & Transfer’s calculation of FY2007 revenues are one modest example of far greater problems across numerous companies engaged in search, content processing, and information retrieval. Vendors are not forever. I heard about two this week who are walking a knife edge.
Organizations looking for a “silver bullet” to solve information retrieval and information access challenges will find that multiple systems will be needed today, tomorrow, and in the future. The notion of one system that will boil the information ocean is crazy. Point solutions can be more successful because requirements can be more precisely defined. A tight specification translates to budget control. There are several “search crises” brewing, and these will have a profound impact on the entire sector. I will try to identify these in my Web log Beyond Search.
The rush to embrace social search functions will face some hurdles in the months ahead. Social systems can be spoofed. Heck, anything online can be jimmied. As more awareness of the upsides and downsides of social systems become better understood, this approach to information retrieval will undergo change.
The effort to make the system smarter is a response to users who are search challenged (think they know how to find information but don’t know much beyond 2.3 key words and pressing the Enter key), less able to know what is needed to solve a problem, or inept when it comes to time management. Vendors will pump their respective systems’ ability to “know” what an employee in a particular role needs to do his / her job.
These issues have business implications. Organizations will give up more of their control of their information. Cloud computing looks pretty tasty when the full time information technology staff can’t keep the search system up and working as advertised. Vendors will deliver solutions that require the niftiest gizmos to work at an acceptable speed.
One search vendor is betting that Intel’s 48 core processors will solve that firm’s computational bottleneck. Finally, the complexity of the layered systems will make troubleshooting and bug fixing difficult and expensive work.
The future of search? It’s here.
The future I envisioned decades ago continues to recede. Give me that old time key word system. It’s clunky, but like a 1965 Corvette, it goes down the road.

















August 13th, 2008 at 9:41 am
The above is really fascinating. “Enterprise search is dead.” I would say that that is a good thing, as it democratizes information, which is good for science and the advancement of knowledge generally. I am in library school and marvel at the many databases each special librarian has to master and how that knowledge is restricted to so few. Gatekeeping begone!
August 13th, 2008 at 9:51 am
Awesome article!
Added to oriango.com
August 13th, 2008 at 1:23 pm
Mr. Arnold says that advances in search are just “layering of functions on top of basic key word indexing”. That is true of Google’s popularity and many statistically-based semantic search technologies.
However, one alternative method does not use the pattern or key word as the basis. That is a linguistic semantic approach in which the string is interpreted for meaning, one term at a time, so that the index is not of strings, but of word meanings (or concepts).
Take the string “strike”. It has 22 meanings in English, such as “hit or beat”, “discover”. “labor walkout”, “occur to someone”, “state of the game of baseball”, “ignite”, and so on. Pattern-matchers save the string “strike” in the index. A linguistic semantic search engine first determines the meaning of strike in context and then saves that meaning in the index. So “strike on the head” is interpreted as “strike1” meaning “hit or beat” and “head6” meaning a part of the body. On the other hand “the workers went out on strike” is interpreted as “worker1” meaning “laborer”, “go20” meaning “walk out” and “strike5” meaning “labor walkout”.
In searching, meanings in the query are matched to meanings in the document base, dramatically improving precision. Recall is improved while retaining precision, because synonyms of just the desired meanings of a term can be found. For example, if the query is “Did Fred strike Harry on the head”, a document with “Fred beat Harry on noggin” is returned, but not “Fred struck the head of match for Harry”, because strike and head don’t have the same meaning in the document as in the query.
Mr. Arnold mentions Powerset as an example of a technology that builds upon key word technology. If Powerset is disambiguating words as described above, it should not be classified as a using a key word approach.
Another source of precision in linguistic semantics is the interpretation of phrases. “Bill of Rights” is interpreted as a fixed and frequent phrase, so that a document with “Bill has a lesion on his right leg” is not returned in response to a query about the “Bill of Rights”.
You can see all of this at work on the demo sites at http://www.cognition.com.
August 14th, 2008 at 4:39 am
Agreeing with Dr Kathleen here. I couldn’t have explained it better myself. I think she really does know what she’s talking about.
August 14th, 2008 at 6:35 am
Great post!
Is(was) Powerset using a form of WSD?