Recently, I was asked by one of our clients to consider writing about faceted classification and, specifically, how to go about an analysis to determine which facets should be used for searching for information. The request is timely. Although faceted classification is not new, the difficulty in modeling taxonomies and ontologies for various use cases, including front end filters, is frequently a top consideration for our new and existing clients.
What is faceted classification? A facet is a particular aspect or a single side of a many-sided object, such as a cut diamond. Facets in knowledge organization are typically discrete categories (taxonomies or schemes) which are “mutually exclusive…and collectively exhaustive” (Wikipedia) in describing a domain. In other words, each facet is bound by the “is a” principle while collectively covering all aspects of the items being described. Facets are combined to classify content or used as filters to narrow and refine search results. Think of facets as entry points into information; providing different entry points singly or in combination allows users to narrow their searches and result set. You can learn more about facets and facet analysis in the ANSI/NISO Z39.19-2005 (R2010): Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies.
An example of faceted taxonomies from a product retailer.
The principle is straightforward enough, but how does one begin to determine which facets should be used? To approach the question of facet analysis, let’s talk about diamonds, how they are forever, and James Bond.
Principles of Facet Construction
I started building an ontology of James Bond, the Bondtology, to illustrate the creation from the ground up of a multi-scheme domain of knowledge. I describe the process in my blog here.
The Bondtoloy is a faceted classification knowledge organization system in that all of the schemes are necessary, but not required, to fully classify content or filter search results on the subject of James Bond. It is also illustrative of the process of facet analysis and displays mutual exclusivity even if it is not yet collectively exhaustive. Each scheme is a vocabulary and each vocabulary is a facet.
The Bondtology is mutually exclusive in that any concept can be placed in one best place in the structure, especially if following the “is a” rule. If there is any ambiguity, such as the difference between Characters and People, we should clearly explain it in a description or scope note. Here, I’ve defined what types of concepts should exist in the Characters scheme.
Keeping with our facets, Diamonds Are Forever is the title of a James Bond film and should exist in the Films scheme and not exist in any of the other schemes. Aha! We have reached our first impasse because Diamonds Are Forever is also the title of Ian Fleming’s fourth James Bond novel and the film’s theme song by Shirley Bassey. However, although the titles are the same, we can address this in several ways.
First, in this initial version of the Bondtology, we are only modeling films and none of the original written source material. So, in terms of inclusion and scoping, novels and short stories will not exist in this ontology. If we are following the rules of mutual exclusivity, we should be able to extend the Bondtology by adding a new facet, such as Written Works, to cover the novels and short stories without major disruption to the existing schemes.
Second, while we are covering theme songs in the Themes facet, the context of the different schemes themselves provides enough disambiguation. Films belong in Films and themes belong in Themes, even if their titles are the same. We can make modeling choices such as whether we want to include polyhierarchy between schemes and have the same title in both locations or create a separate entry for each: Diamonds Are Forever (film), Diamonds Are Forever (novel), and Diamonds are Forever (theme). While not necessary because each concept lives in separate facets, we can use parenthetical qualifiers to maintain that each concept is a separate thing because they are in mutually exclusive categories.
If we follow the principles of mutually exclusive facets, we should be able to extend the knowledge domain by adding additional facets with little disruption to the original structure. For example, I mentioned that the Bondtology is not collectively exhaustive. We cannot fully categorize James Bond films with only the existing facets.
We love the use of gadgets in James Bond, but where do we describe them in this current structure? We often think of exotic locales for film settings, but where do we name the locations and link which films featured these locations? Right away, we see we are missing a faceted structure for Devices (or Props, or Gadgets, or Tools or some other descriptive scheme name) and Geography. Right now, we have a facet called Themes, but this is reserved for James Bond film theme songs. Where do we put other pieces of music used in the film or do we include them at all? What about thematic subjects? Do we create a new facet called Subjects or do we need to repurpose the name of the Themes facet and create a new facet called Music?
These considerations are both part of taxonomy modeling and facet analysis. The taxonomy model should represent the domain of knowledge completely, while using faceted schemes allows us to describe content or filter search results for end user consumption.
Online retail is often cited when talking about faceted taxonomies used for product searches, so let’s mash up our faceted content Bondtology with the world of e-commerce to illustrate some considerations in facet analysis.
On an online retailer website, I conduct a broad search for “James Bond”, and I bring back a variety of content. The two most obvious are films and books. So, immediately, we need to include Media in our schemes. A level deeper, however, finds us considering which format of media. For films, we have DVDs, Blu-ray, and streaming (at a minimum), with other older formats which may need to be included. Similarly, for books, we have electronic, paperback, hardcover, and more. But what do we include as facets and which of those facets are included in our Bondtology? If we want to cover Media and Format, these seem like reasonable choices. Similarly, our Genres and People facets are useful for narrowing down our search results to only Action & Adventure films and only movies in which Daniel Craig is playing Bond.
The intersection of product description and product information is worth considering in facet analysis. Where do our ontology facets end and product information begin? Is the condition, new or used, something we manage as part of our domain? What about the release date? For films, what about ratings? Unlike content classification and searching, product taxonomies include a lot of information useful for filtering which should not necessarily be represented in taxonomies.
The combination of product information held in product information management (PIM) systems and product-descriptive metadata held in taxonomy and ontology management systems delivers a customer experience fit for purpose. Since product information, such as dimension, material, media format, and a host of other product details, are already being managed in a system dedicated to describing products, it makes sense to retain that data in place but still make it part of the faceted classification search and browse experience.
For example, I can begin to search and browse by descriptive categories which help to get to a product set which share those common values. Again, searching for “James Bond” as a topic (a character) in a search box and then searching for or narrowing by a provided search facet, such as Films or a broader facet called Media, I can get to the film Diamonds are Forever. Since it is already narrowed to film, only facets which make sense to films will be available, such as Format (streaming, Blu-ray, etc.), Length (in hours, minutes, and seconds), or Genre.
As this example shows, managing all of those values in an ontology may not be practical, and a guide may be whether the content is a finite, or nearly infinite, set of values. Should the Length facet include only time lengths or dimension lengths? In what units? Given the incredible scope of values in this instance, taking data directly from the PIM makes more sense than trying to curate them as part of a very large faceted taxonomy.
The goal remains to provide facets which are both mutually exclusive and collectively exhaustive. Using a combination of a topical, faceted taxonomy from a taxonomy management system and facets including product information from a PIM provide several entry points into information.
Facets Are Forever
Modeling faceted taxonomies and determining which facets are appropriate for a user interface are not without their challenges.
Maintaining mutual exclusivity, as mentioned above, can lead to a proliferation of separate taxonomies to cover every aspect of content. The purity of a mutually exclusive vocabulary may need to be blurred to accommodate concepts which are similar enough for the user experience.
Likewise, a very large domain, or what are frequently multi-domain environments (for example, retailers who sell a very broad range of products), may require multiple faceted taxonomies to cover each domain logically. One can see this illustrated on retail sites such as big box home improvement stores or online groceries. Length for lumber is fairly standardized, but is not the same as a the length of a rake or hammer handle. General product colors broken into the seven broad colors will not work for paint, an area in which paint manufacturers require their specific colors to be shown not as part of a larger palette.
Challenging environments like these frequently rely on taxonomy management systems working in conjunction with product information and web content management systems in order to provide separate facets and to provide those separate facets according to the product type. Lumber has L x W, a refrigerator has L x W x H, and the handle of a rake has L. It does not make sense to display all three with empty values where they don’t apply.
While there are general rules for facet analysis and construction, each use case and application will require its own set of rules. You, as the person responsible, will need to approach the problem from all sides.