Skip to main content

In previous posts I have described how a knowledge graph is an ontologically modeled structure that is attached to some kind of data.

At this point, the graph is just sitting there; it doesn’t do anything until you give access to (1) users, and/or (2) other systems to either (a) retrieve information from your graph, or (b) add (or remove or change) information in the graph.

We can therefore categorize these requirements along two axes:

chart

Access to information in your graph: People

If you want people to be able to access your graph you need to provide some mechanism or interface. This could be as simple as exposing a SPARQL endpoint that users can query…if they know SPARQL. This is necessarily limiting; if your audience for consuming the information in your graph is limited to people comfortable using a query language, that’s fine.

If you want your information to reach a broader audience (say, you have a graph to keep track of information about movies) it might make sense to offer some kind of user interface. This can be as simple as a search box or as complicated as a visual interface.

The genius of the Google Knowledge Graph, of course, is that you access it by accident by simply using Google; in addition to getting “traditional” Google results in the form of a list of  relevant websites, searches that ping the graph offer additional information in a sidebar, as we’ve all seen.

Many other ways of offering interaction with graphs without requiring the user knowing SPARQL are possible, and I think interesting solutions (like query “wizards” that allow the user to perform queries in a graphical interface without learning SPARQL) are on the way.

Access to information in your graph: Systems

Again depending on your use case, you may want to make the information in your graph available for querying by other systems. Since RDF is an accepted standard of data storage and transmission (although it is of course extensible, which is a whole ‘nother topic), other systems can query the graph to discover information.

This could be as simple as using a Linked Data URI to extract information: if I want to know Sean Connery’s birthday, I can write a query to get that information from DBpedia; if I have a list of all actors who have ever (say) played a villain in a Bond movie, I can write a program to get all of their birthdates (and other information).

In an alternative scenario, let’s say I keep a record of (logged-in) visitors to my website and which content they read; maybe I also have metadata about which topics they often read, and which of my authors write on which topics. If I store all of this information as triples in my graph database:

triples

…perhaps I have an external system that periodically (daily, or hourly, or whatever) gathers this information for analysis in another application. Such a program could query not only the literal information stored thusly (“which papers did Visitor123 read?”) but, since we’re dealing with RDF and SPARQL, it could query inferred information (“what topic did Visitor123 most read about?”) by connecting the dots in my triples.

graphsample

Editing (Adding or otherwise changing) information in your graph: Systems

In addition to letting users and systems access your graph, you need to have automated and/or manual processes to update it. Depending on your graph it could already be outdated: this is probably not true if your graph describes something relatively stable like geological eras, but if you’re tracking, say, which customers look at what items (products, content) or who’s publishing papers or any number of similar uses you need to continually add new data.

In either of the scenarios in the previous section, I may want to offer the same types of access points to allow other systems to add (or delete, or change, whatever) data in my graph: updating records when Bond villains die (well, the actors who play them, anyway) or to add information about which customers viewed what content on which topics by whom.

In either case, data has to be gathered (somehow) and, if not already in RDF, converted to triples (which obviously must conform to the ontology schema behind my ontologically modeled structure) and added to my Graph.

This is essentially the reverse of the process described above, as SPARQL can be used to add (or edit, or remove) information from a graph database.

Editing (Adding or otherwise changing) information in your graph: People

Again, users who know SPARQL (and have the proper access privileges) can write information directly to your graph database. For other users to do the same, you need an interface to (a) offer an easy way to interact with the data, and (b) constrain the kinds of information that can be entered. This second bit is key, and I think somewhat underdiscussed.

Interfaces

When a human (not a system) wants to add, remove, edit, or otherwise change information in the graph, the most logical tool is some kind of ontology management tool (or other vocabulary management tool that uses native RDF as the back-end technology) that is also amenable to Linked Data and other kinds of data-linking capabilities.

Typically, ontology management tools offer a graphical interface for constructing, maintaining, and publishing (via export or API to other systems) RDF-based ontologies without writing directly in triples (e.g., SKOS, OWL), URIs, and using SPARQL; that work is done on the back end and perhaps during schema development.

With the obvious caveat that I’m biased, as Synaptica publishes just such a tool, I’ll note that I’m also allowed to take and publish screenshots of it (which may not be the case with other software) as examples here:

graphite1

Constraints

Triple stores are on the whole very permissive; a tool allows you to constrain the user to adding information only when it can be validated (using some kind of schema and/or enforcing business rules). Constraints can include, for example, a set of commonly used predicates and classes as well as the capacity to invent (and assign URIs to, and publish) custom classes and relationships as needed, all within valid (and shared) RDF frameworks.

graphite2

This topic is, I think, less often discussed and merits further discussion (but not now).

Visualization

Lastly, and briefly, such tools also often offer various options for visualization of your graph; these vary from simple to complex (and from static to functional/editable). The image below is from Ontotext GraphDB.

beamsuntoryviz

NB: Since it seems that April 2020 is basically cancelled (including, very likely, two conferences at which I was scheduled to discuss knowledge graphs and such), I’ll be providing some of that material in upcoming blogs.