In my last blog post I discussed the “rise of the knowledge graph” and hinted at why, in some situations, it’s useful to go beyond the standard thesaural relationships (BT, NT, RT) between concepts and model their relationships more carefully and specifically. At this point I think it would be helpful to define some terms (note: I am certain not everyone will agree with my definitions).
It seems to me that (as ironically as possible) there are a lot of opinions about the meanings of, and relationship between, the terms taxonomy, ontology, and knowledge graph. Oh, and thesaurus. And Linked Data.
I think that:
A Taxonomy is a hierarchically organized set of terms (describing concepts or things) with defined broader and narrower relationship types.
Oftentimes people (including me) say Taxonomy when we mean Thesaurus, as in practice most “taxonomies” have associative relationships as well as other information about each term: alternative versions of terms, definitions, scope notes, and so on. Really, a term can have as much information (that is: as many fields (properties) containing additional types of information) as you like. Some of these fields may contain (require) values from another (usually smaller) controlled list or some other constraint — such as Yes/No or Australia or 5 or 12% — which starts to edge towards a knowledge graph; it’s also possible to equate a term in a Taxonomy (Thesaurus) to some outside resource (say, a human-readable Wikipedia page or a machine-readable data source), which starts to edge towards Linked Data.
N.B.: It’s already obvious why there is some confusion, and also why taxonomists (thesaurus…ologists?) are interested in Ontologies and Linked Data and graphs and so on.
Ontology (formal structure)
An Ontology is a formal structure for modeling knowledge organization systems (including Taxonomies and Thesauri). SKOS and OWL, for example, are schemas for modeling (and storing and transmitting and publishing and sharing) vocabularies: terms and their properties and relationships. You can model a Taxonomy (Thesaurus) in SKOS or OWL, but not all SKOS and OWL vocabularies are Taxonomies (or Thesauri). This is because SKOS and OWL permit richer descriptions of the relationships between concepts (and each other, or and their properties) than are required to model Taxono-thesauri.
Ontology (vocabulary) [Ontologically Modeled Structure(?)]
Unhelpfully, vocabularies modeled in this way are also commonly called Ontologies; what they really are is [something like] Ontologically Modeled Structures (which no one says, since I just made it up).
Therefore SKOS and OWL are examples of Ontologies that are models, while FIBO and CABI are examples of vocabulary structures modeled in this way, and are also called Ontologies (but which I am recommending that we call Ontologically Modeled Structures).
In an extremely recursive and unhelpful (and ambiguous, I might add) way, we in the industry use the same word to refer to both the model and the modeled.
Great job, everyone.
For clarity (and to avoid excessive parentheticals) I will refer to this concept in this way (or with the abbreviation OMS) for the duration of this post.
Taxonomy vs. OMS
In essence, a Taxonomy (Thesaurus) is a type of lightly specified OMS*, which may or may not be modeled using an Ontology (formal structure). Therefore, in my taxonomy, all Taxonomies (Thesauri) are OMSes, but not all OMSes are Taxonomies.
An OMS may contain one or more Taxonomies (Thesauri), but does not have to; you can certainly build an OMS with no hierarchical relationships (like a flat authority file, such as a list of countries modeled in SKOS). However, many OMSes comprise a network of interrelated Taxonomies (Thesauri); further, Thesauri with many term properties and relationships (including Linked Data) are edging up to being OMSes, although not always modeled as such.
All of the above are also referred to, as a class, as Knowledge Organization Systems (KOS).
Clear so far?
A Knowledge Graph is, I think, a specific type of OMS that features:
(1) bunch of concepts (and/or things), and
(2) their specified formal relationships, and
(3) information (properties) about each term, specifically including
(4) Linked Data, or other kinds of links to external data resources, and
(5) stored in RDF for queries and inferencing.
This may include the automatic addition of information to your Knowledge Graph, whether curated or via some pipeline of information, for example through pulling information from Linked Data sources.
Importantly, a Knowledge Graph also features a way to infer (or “reason”) new information not expressly stated based on the information in the Graph using an inference engine (or “reasoner”). One definition of Knowledge Graphs marks this as the critical piece:
A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.**
The de rigueur example of this is: given two explicit triple statements:
John lives in London
London is in England
…we can query, or infer, that:
John lives in England
…since we know, based on our knowledge model, that anyone who lives in London must also live in England. Although simplistic (and obvious) the example above illustrates the basic principle of inference.
In order for inference or reasoning to work properly, great care (that is: specificity) must be taken when building Knowledge Graphs.
According to linkeddata.org:
“Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.””***
One of the main ideas here is that there are a whole lot of vocabularies out there, many of which describe the same concepts. It is useful to be able to share data about these concepts; so, instead of everyone trying to link to every available page (or term record, or both) about Optics or Sean Connery we instead link to centralized, authoritative data sources.
So if I have a term in my Taxonomy or OMS about Sean Connery and I want to add a field with a Definition (or abstract, brief bio, or summary) instead of researching and copy-pasting it into my vocabulary I can access the DBPedia (a commonly used Linked Data source) page, extract the information, and add it to my own knowledge organization structure:
This (essentially semantic) technology is well defined and straightforward to use.
What’s the Point of All This? (Why Knowledge Graphs?)
Because a Knowledge Graph is an OMS, you can define any relationship (predicate) you want/need to model your data. Many such relationships already exist (in shared public namespaces), so you can refer to them and reuse them; further, this shared understanding of predicates allows information to be shared (and understood) by machines across various OMSes.
This makes relationships generally understandable so they can be used to tie together ontological knowledge models. So it’s fairly easy to link up and combine datasets (expressed as OMSes), and reuse existing datasets to enrich your own data.
The idea, then, is that we can build Knowledge Graphs to store and structure information in a machine-readable and -shareable way to provide information, including information from external Linked Data sources.
The most famous example is still the Google Knowledge Graph and the information it provides for searches.
As Part 2 of this topic turned out to be largely a digression of definitions, I will return to use cases for Knowledge Graphs in a subsequent post.
*In the same way, I suppose, a Taxonomy is a lightly specified Thesaurus. This relationship is usually described the other way around: the idea being that a Thesaurus is a Taxonomy with “extra stuff” in it.
**Ehrlinger, Lisa, and Wolfram Wӧß. “Towards a Definition of Knowledge Graphs.” SEMANTICS 2016: Posters and Demos Track, Leipzig, Germany, September 13-14, 2016. Johannes Kepler University Linz, http://ceur-ws.org/Vol-1695/paper4.pdf.
***from http://linkeddata.org/ accessed October 3, 2019.