Skip to main content

Consider the small hierarchical structure shown at right; you might find something like it in any number of product, e-commerce, or indexing taxonomies:

skiiinghieararchy

Although this structure seems reasonable at first glance, there are actually several complex things going on here.

The structure as shown is intuitive: all of the Narrower Terms are definitely associated with Skiing; they are all properly Subtopics of the Topic of Skiing.

Clustering all of the related concepts under a single Broader Term (Skiing) is a common way to organize product taxonomies, which are designed to allow users to quickly find what they’re looking for. This organization is predicated around Skiing as a Topic.

Within this small structure the terms “Cross-country skiing” and “Downhill skiing” are in fact types of Skiing: subclasses (or sub-genres) of Skiing. You might say, obtusely, that “Downhill skiing IS A Skiing” to denote that in fact Downhill and Cross-Country skiing are subclasses of Skiing: they treat Skiing as a Concept.

On the other hand, Ski Equipment and Ski Resorts are not, strictly speaking, types (or Genres) of Skiing; we might (just as obtusely) say that “Ski Resorts IS NOT A Skiing” 

So we really have two things going on here: the terms “Downhill Skiing” and “Cross-country skiing” treat Skiiing as a Concept while the terms describing apparel, equipment, and resorts treat Skiing as a Topic,

The problem of conflating (or mixing and matching) Topics and Concepts is I think one of the subtlest and most confusing in the field. Many taxonomies break down along these lines.

To reiterate: this kind of structure is useful in some kinds of taxonomies. In other, stricter kinds of taxonomies—specifically (but not exclusively) those used as the basis for ontologies, or for machine learning and other kinds of inferencing—this is highly problematic.

So taxonomies intended for, say, relating interests of users (“people interested in Skiing are interested in Ski Equipment”) for recommendation systems or searching for products (or services, or content) related to skiing might be organized in this way with little problem. But taxonomies used for reasoning, or as the basis for ontologies—taxonomies that try to define the semantic relationship between concepts more strictly—require a more considered approach to concept-relationship modeling.

The ambiguity between Topic and Concept is often found nearer the top level (Top Terms, or Terms with no parents). Top Terms are often used as general Topical buckets and often are excluded from indexing (not attached to content or products) and serve as navigational starting points. While strictly speaking this does not exclude Top Terms from the requirements for taxonomy relationships it’s extremely common for Top Terms to be used as Topics and not Concepts.

Further, I posit that the closer a term is to the Top level of a taxonomy the more likely it is that the term represents a Topic, and that lower-level terms are more likely to be treated as Concepts. This is certainly not universally true, and many counterexamples can easily be produced.

Regardless, there are options for modeling Concepts more strictly and avoiding Topical buckets to create a more ontological structure. Each approach has upsides and downsides, of course.

Option 1: Separate Terms into Concept branches and relate them using RTs

The structure shown above presents the Subtopics that were under Skiing (Resorts, Equipment, Apparel) into their own conceptual branches and uses RTs to show their relationships as Related Terms.

Advantages: Each concept is properly and strictly delimited within branches.

Disadvantages: RTs fail to explicate the relationship between the terms in a useful way, and in a larger structure than my small example the volume of RTs could become cumbersome; even in my tiny structure (9 terms) many terms are RT’d to four or five other terms, as shown. (In fairness, it’s actually no less explicit than the original structure in terms of modeling the relationships between concepts.)

Option 2: As above, but use a Facet or Collection to group concepts across the hierarchy by Topic.

 

Ironically, in the land of controlled vocabularies, “Facet” is used to denote several kinds of structures and structural components (which is why we call this “Collections” in our software). What I mean, here, is a categorical indication (essentially a flag) that can be used to label concepts across a hierarchy. In this case, I can create a Collection “Skiing” to flag all concepts in my vocabulary that are related to skiing regardless of their position in the hierarchy.

In essence, I am using the Collection to create a subsection of my taxonomy based on Topic (instead of Concept) which does not affect (and is not reflected by) the hierarchy; further, I can now easily select and view all terms in this Collection:

Advantages: This is an extremely useful way to model vocabularies, as it allows me to have a strict hierarchy organized by Concept and still group things according to Topic.

Disadvantages: Relationships between concepts not in the same branch are still vague.

Option 3: As above, but also use custom RTs to model the specific relationships between terms (build an ontology).

 

Ideally, we can model the relationship between, say, “Skiing” and “Ski equipment” in some useful way (which will depend on the use case); in addition, it would be nice to be able to use that relationship elsewhere (between “Golf” and “Golf equipment” for example). So a predicate like “Is Used In” and “Uses Equipment” might look something like this:

While some taxonomy applications offer options for creating custom relationships, this option is essentially ontological. The disadvantage here, which is not trivial, is that in a large vocabulary the design, modeling, and execution of specific relationships is much more labor intensive than building out a taxonomy with only standard thesaural BT, NT, and RT relations.

Lastly, it may be the case that the user-facing taxonomy is structured very much like the one we began with (in which conflating Topics and Concepts serves purposes of navigation and discovery) while the underlying ontology has more complex and specific relationships. This is, surely, a topic (or concept) for another post.