Terry Janssen of Lockheed Martin gave a very interesting presentation on Wednesday the 24th September 2008 at the Enterprise Search Summit West in San Jose. During his talk he discussed some of the technologies that Lockheed employs as a systems integrator to improve search within their own and other enterprises.
A common theme was revealed during his talk that was repeated by in several other presentations and by vendors at the Summit and its sister conference, Taxonomy Bootcamp. Namely, that taxonomies are the key to improving and refining enterprise search.
Several vendors were present at the conferences, such as Collexis and Endeca, that offer just such tools where traditional algorithmic search is joined with filtering provided through specific, domain based taxonomies. In this way one effectively gets the “best of both worlds” in that they can retrieve information across a potentially huge set of resources from disparate sources and locations, and then be able to use technologies that employ known concepts and terminology from existing taxonomies to makes sense of those results and filter to a specific set of items to provide precisely the desired information.
Natural Language Processing
Technologies such as Natural Language Processing, metadata tagging, and entity extraction may be used in conjunction with taxonomy to arrive at limited and context appropriate results as a subset of the broader outcome returned via a keyword or phrase search. Resource repositories may be actively crawled (including things like graphics) and key entities are extracted and indexed. In other cases, items already indexed via metadata tagging can be returned and then matched against the structure and familiarity that enterprise taxonomy provides to the end user.
Graphics searches exist today and there is work being done now in the area of video and audio entity identification and extraction that will allow for searching across a range of media formats. All in all it was clear that this hybrid approach represents the leading edge in search technology, and that algorithmic search alone can not return the specific, context relevant results that can only be provided by using domain specific taxonomies that represent knowledge from the enterprise that can not, at least at this point, be generated by any means other than from the human perspective.