Bob Kasenchak and Ahren Lehnert, Synaptica
The theme of the fifth installment was “How to evolve a knowledge graph?” and discussed issues of graph maintenance and change. Video of this session is available here.
Héctor Pérez-Urbina, Google Knowledge Graphs
Knowledge Graph Evolution and Maintenance
Pérez-Urbina’s excellent talk focused on the challenges involved with keeping a very large graph updated as well as not disrupting downstream processes (systems as well as users) that are already using and ingesting information from the graph.
Pérez-Urbina framed the problems as “modeling in a dynamic world” and noted that over time assumptions, data, sources, and needs will evolve. It was notable that at several points he mentioned how much human mediation (humans in the loop – HITL) is involved in both the curation and testing cycles.
Noting that changes are easier in a graph than a traditional relational database (but not easy!), he outlined several factors to be accounted for::
- What’s changing?
- Is there data there already?
- Who is using this data? And how?
“What’s changing?” applies, crucially, to both data and schemas; at one point, the Google team decided that anything shown in a theatre was of the class “Movie” but ran into problems since sporting events and operas (for example) do not fit this assumption. This broke downstream applications that were using the data. Other very illustrative examples included the assumption that all musical artists are people (they’re not, as fictional characters are sometimes included in this class) and that all books are publications.
The complexity of the ecosystem and the size of the graph are also considerations; a single SPARQL endpoint supporting a small graph is pretty stable, but a multi-system ecosystem supporting dynamic ingestion, reconciliation, and inference (for example) is much more complicated.
The concern “Who is using the data?” means considering whether their applications switch to the new data, the UIs are still functioning, and whether any AI or ML systems are using your model for inferencing and support.
He also noted that “All issues get worse at scale” and that changes can affect performance, correctness, consistency, testing, experiments, and evaluations.
Lastly, in discussing remedies and solutions (“easing the pain of KG changes”) Pérez-Urbina urged practitioners to “know your knowledge graph”: Who contributes to it? Who uses it? And how? And how is the data modeled?
Knowledge Graphs for Natural Language Processing
Gómez-Pérez, who works for the semantic intelligence company Expert System, demonstrated his platforms with two main questions in mind: Why do we need KGs for NLP? And how can we extract information from text and KGs to build better NLP?
The platform features a searchable, browsable graph of words and concepts (which are separate but connected) with word proximities, parts of speech, and relations; it’s multilingual, but with certain limitations (the correspondence between the Spanish and English graphs is about 28% at this time).
The system uses a knowledge-based approach in which a domain expert works with the platform to create a synthesis of a domain model with highly curated sources, which can be used to extract concepts, entities, classes, and events in text.
Notable, again, is that the system “can be rigid or brittle” so it depends on strength of the model as well as humans in the loop.
Obliterate Silos with Knowledge Graphs
Uschold started with the premise that “silos are the bane of enterprises” and that graphs can help connect systems and build applications to solve business problems not easily addressable using traditional databases.
Uschold advocates an iterative, essentially Agile approach to Knowledge Graph construction for enterprises comprising:
Identify questions you want your graph to answer
Build an ontology and triple store to meet these requirements. Uschold emphasized that schemas are a necessity and one should always be used; the earlier the better
Build applications to use that data
Broaden the scope of the graph by identifying another set of questions
Extend the ontology to meet these new requirements
Coordinate with other ontology authors in enterprise
Made the data and ontology available as triples
Extend and build out applications
Iterate as above
This process leads to a flexible triple store and graph reusable by various departments or teams. It does require the use of a single ontology and triple store, which makes sense as part of the “no silos” ethos Uschold describes. He encapsulated this vision as “you cannot reuse what you don’t understand.”
Also of note was the observation that “Information-Providing Companies” is an emerging sector encompassing companies that already have a strong focus on metadata and understand faceted search.
Finally, he noted that “semantic technology is going mainstream” which, we think, is good news.
This was another useful set of talks with much food for thought as the course continues to explore various aspects of knowledge graphs from many perspectives.