In a recent blog, I created a typology of types, a taxonomy of taxonomists, as a humorous guide to the wonderful ways of working employed by taxonomists. My taxonomy persona, The Artist, may approach taxonomy construction as an attempt to create a work of art. One way of working for an artist is to use a model to create his or her piece: a living, physical model; objects positioned as a still-life; a scale model for a much larger work, etc. The act of modeling is essential to provide a view of the finished state before several false starts and path corrections slow the process. Just as a writer creates an outline before writing or the artist creates a model before working on the piece itself, so too should we consider the taxonomist and the art of ontology modeling.
Ontologies can be as simple or as complex as necessary to use in an organization for navigation, supporting content tagging and search, surfacing related content, or use in a knowledge graph. Frequently, ontology models become more complex as additional use cases and parameters are added, so doing whatever is possible to model the ontology well from the start sets a clear path for its growth. Ontology modeling is really no different than laying out the foundations for a project or producing a work of art, especially as there is no one right way to model an ontology.
Before diving into ontology creation, a taxonomist should conduct modeling sessions in order to plan what his or her finished (though never finished) product will look like. An ontology modeling session can include a variety of methods for collecting information and gathering input from stakeholders including one-on-one interviews, group brainstorming workshops, and input solicited from across the organization using forms or emails.
Workshops including the taxonomist or taxonomy team, business end users and stakeholders, and information technology development and support can be an effective way to conduct ontology modeling. In my experience, workshops provide the most input and ideas in the shortest amount of time. Getting everyone in the same place (whether together or remotely) at the same time can prove challenging, but a two-hour or more session can provide as much information as many one-on-one interviews. As recent events have proven, workshops do not need to be held in person. While there is benefit to gathering the group in a single location for discussions, interactive activities, and lunch, an ontology modeling session can be conducted using various conference and shared activity tools.
Such a workshop will need to carefully lay out the intended known uses of the ontology and the structural elements so the participants understand the logic behind an ontology model. While not everyone needs to be a taxonomist/ontologist, understanding the underpinnings helps users consider what content the ontology should cover and how it can be surfaced in various applications.
Stanford published Ontology Development 101: A Guide to Creating Your First Ontology many years ago, but it covers the fundamentals of ontologies well. Regardless of what technique you use to include others in the ontology modeling, having a basic understanding of the structural components of an ontology may help participants.
At Synaptica, the components of an ontology adhere to standards such as those published by the W3C. While some of our terms may differ slightly, the concepts are the same.
I have been intentionally building an ontology from the ground up rather than importing pre-existing information with only a very basic mental model of how I want the finished product to look. The exercise casts a light on what must be considered in ontology modeling as well as provides a practical way to test our ontology management software, Graphite. I’m going to use this example to illustrate the structural components and modeling considerations for an ontology.
Starting with the most granular structural component, the concept is an individual instance usually described with a single preferred label in the language of the users, additional preferred and alternate labels in other languages as needed, alternate labels to cover acronyms and other variants, and associated attributes, properties, and predicates. An instance has a unique resource identifier (URI) which identifies it regardless of label and property changes.
I started with a simple idea: an ontology covering the domain of James Bond. Brainstorming a flat list of concepts was relatively simple. My mind first went to films and then quickly to characters, and I developed a flat list which developed into a two-level hierarchy.
Almost immediately, I was confronted with modeling decisions. Should the characters remain a flat list with relationships to their roles? Could I use a property to define the nature of each character? Or, as I ultimately did, do I create hierarchy to group the characters? Even this choice made it necessary to use polyhierarchy as it is common for a character to be in multiple categories: many Bond henchmen turned and became Bond allies, if only briefly.
Properties & Relationships
A property defines the object end of a triple: one instance is the subject, the defining relationship is the predicate (which can be thought of as the verb), and the other instance is the object. Any instance, or concept, has a relationship to another instance. For example, a concept can have a status, be the broader, narrower, or related relationship to another concept, or have any other defined relationship to another instance. For the purposes of a general workshop, I’ve found it’s helpful to speak in terms of concepts and their relationships to other concepts. Even though there’s a relationship to its other attributes, it’s also helpful to think of these attributes as metadata associated with each concept.
For the Bondtology, some properties began to take shape when people, including actors and directors, became part of the structure. Things like Birth Name and Star Sign were interesting properties to associate with people. Similarly, an essential piece of information for films is the Release Date. Some relationships were obvious, like Broader and Narrower, but others only became apparent as the different concepts and their relationships became clearer. Many of these relationships are reciprocal while some are in only one direction.
While it is convenient to think of ontology modeling and development as a linear path from beginning to end, it doesn’t always progress that way. Despite doing some loose modeling at the beginning of the Bondtology, many things, such as classes, only took shape as concepts, their relationships, and their placement in separate schemes became defined.
A class includes instances (concepts), their properties and relationships, and the rules and restrictions on what can be included in the class. It’s oversimplifying the matter, but thinking of classes in terms of the simple “is a…” rule is helpful in expressing the concept to a general audience. For example, in a class including Films, we would expect to see the names of films and properties and predicates which make sense for films: release date, actors, director, synopsis, filming locations, and the like. The relationships reflect these properties: hasReleaseDate, hasActor, hasDirector, hasSynopsis, filmedIn, and so on. If another class were about people, these properties and relationships would only apply in relation to the Films class. An actor has a release date, for example, but we would typically call this the date of birth. However, an actor does appear in a film, so there is a relationship pointing to those concepts even if the properties are not shared.
In the Bondtology, classes are defined in a one-to-one relationship with the schemes as described below. This was done for simplicity’s sake and became a guiding principle in the ontology development.
The next largest structural unit is the scheme. The scheme is a vocabulary and can have a one-to-one relationship with classes, but does not necessarily need to. For example, a People scheme could list actors, directors, and other types of people associated with film-making in keeping with our example. This scheme could have a single class covering all people and include the appropriate properties or the classes could be more granular. Actors are people, but they may have different properties than directors as people or any other role associated with making films.
For the Bondtology, I made a clean and simple separation between schemes, with each vocabulary being constrained by classes which make sense for the concepts included. At first, the Bondtology was three flat lists: Films, People, and Genres.
In this first stage, it was clear there was a lot of information missing and there would need to be more consideration of what else to include. As work progressed, the ontology developed iteratively and grew to incorporate my initial Characters list and Themes as well as many more properties and relationships. In addition, the ontology is used to tag concepts in Confluence which includes a scheme of its own.
So far, the Bondtology includes five schemes and an additional scheme representing the tagging of Confluence pages. Together, these schemes are an ontology. An ontology may cover a single domain or many domains as needed. At this level, we really see the need for modeling as different schemes with different types of concepts need to interact and inter-relate.
We can also include the notion of a collection or category which is similar to a floating property. These collections can be assigned to any concept in any class or scheme and provides a way of grouping concepts across schemes. For example, a collection could be something like 1990s and be applied to Films and People active in that decade. For the Bondtology, rather than building an Organizations scheme, I used collections to represent the organizations in the James Bond books and films.
In summary, concepts have metadata and relationships to their metadata and other concepts. Concepts with similar properties and relationships share classes. Classes can have a one-to-one relationship with faceted vocabularies but many classes can feasibly be included within a single vocabulary. One or more inter-related schemes defining a domain of knowledge is an ontology.
What has been built of the Bondtology so far has been through trial and error with very little outside input. Hence, there have been many new directions and revisions of prior choices. There will be future choices to be made as well. For instance, the organizations are currently applied as collections, but they could easily be expanded and turned into a scheme requiring relationships back to the characters belonging to each. Likewise, a Locations scheme and an Equipment or Gadgets scheme could be developed and inter-related with the current structure.
Defining the structural concepts for workshop participants shows exactly why ontology modeling is a necessary exercise.
Your Next Top Model
It’s easy to see how modeling an ontology prior to the build can save time and wrong directions. For example, I added musical artists and theme songs later in the development. While musical artists are people, they don’t share the same properties and relationships as actors and directors. In addition, the People scheme was originally built around the notion of individuals while many musical artists are collections of people in a band. Therefore, some additional work needs to be done to differentiate what an artist includes versus other people properties.
What I have been doing through trial and error could have gone much farther much faster if I had enlisted the help of others in a workshop. For instance, brainstorming the possible schemes—Characters, Equipment, Films, Genres, Locations, People, Themes—could have been set from the beginning, allowing a clear understanding of which relationships between concepts should exist and whether collections should span many schemes or not.
I could have also made the choice to find existing data and import it rather than create it from scratch. Importing existing vocabularies is still an option, especially for a general subject like geographic locations.
While it’s been possible for me to slowly add to the overall Bondtology and even reverse decisions that no longer made sense as the schemes grew in scope, the work of undoing prior decisions and going in new directions takes time from the overall development.
There are always known and unknown unknowns, including changes in scope and what applications will use your ontology. However, developing Your Company’s Next Top Model ontology can get farther faster with planned modeling activities.