Bob Kasenchak and Ahren Lehnert, Synaptica
The theme of the final class in the ongoing free-to-attend Stanford Knowledge Graph course was What are some open research questions on knowledge graphs? and featured useful talks by speakers from Salesforce, Stanford, and Google Data Commons.
Video of this session is available here.
Multi-Hop KG Reasoning with Deep Reinforcement Learning
Socher started his talk with some real-world examples of knowledge graphs, including WordNet, Google, and Amazon among others.
Socher then expressed his reservations about how much KGs can really capture knowledge complexity in the world, as they are often incomplete, overly specific, and out of date or noisy. Knowledge graphs may not perhaps be the final answer to represent all knowledge. However, they are very useful for structuring lots of kinds of information that’s concrete knowledge, like medicine, such as COVID-19 datasets. KGs are useful and computable for actionable information. KGs are a sensible way (in a lot of domains) to represent information.
Chatbots are good examples of KG-driven applications with real-world uses when deployed correctly such as reasoning over discrete entities, such as topics. They can answer questions such as: Where is my order? When will it arrive?, etc. Likewise, a chatbot can answer questions like “Which directors has Tom Hanks collaborated with?” They can find entities and topical relations for easy answers to query, but the answers could be incomplete or noisy. They need to be more robust in the way we query them, and we can compensate for missing edges that should be in the data.
We need models that are interpretable and highly accurate that work over incomplete knowledge bases. Knowledge Graph Embeddings (in neural networks) can reduce problems such as whether things are two entities or not by embedding every entity as a vector and then training embeddings as well as the neural network.
Neural Tensor Networks (NTNs) can be used for relation classification according to training data. Every entity and relationship in a graph becomes a vector and is embedded into a neural network. This is used as a training set essentially to answer questions about relationships that should exist but don’t with high accuracy. It is a pretty simple and very efficient approach. It is also possible to bootstrap a knowledge base based on word vector proximity.
The problem with the above approach is that it’s hard to explain/parse: for example, on which inference chain did it make this judgement? You can’t find the discrete chain of reasoning. What’s needed is a model to do sequential (explainable) decision-making. A Multi-Hop Reasoning Model “hops” around the graph based on a beginning node to “discover” what’s proximal for reasoning and it is interpretable. You can find the particular actions that lead to a conclusion or classification.
The result is a larger, more powerful neural network with readable, probabilistic results. It is trained with data to reinforce using “rewards” for the algorithm for correct reasoning. Entities are still embedded in the network as vectors but a discrete algorithm trained with reinforcement reasoning consistently matches embedding performance and includes interpretable results.
Socher concluded with some key take-aways:
- Practical KGs are incomplete and require automatic completion
- KG embeddings are effective approaches for recovering missing facts but lack interpretability
- Multi-hop KG ingerence is interpretable but less accurate
- Muti-hop KG inference with embedding-based reward shaping combines the best of both approaches
- The method can potentially be extended
Mark Musen, Director of the Stanford Center for Biomedical Informatics Research (BMIR)
What do Knowledge Graphs Really Know?
Musen went back to the past in order to place knowledge graphs in context in the present and the future. Lots of things that have been going on in AI for the last 50 years are relevant to KGs but have not been realized.
Musen first discussed the kinds of KGs so far discussed in this seminar, and suggested we all know what a graph is, but asked “What do we mean by Knowledge? And what does it mean to say we’re putting knowledge in a graph?”
In the 1970s, Semantic Network representations in AI using vertices and edges for concepts and relations are very similar to KGs. The expert-system craze started at Stanford in the 70s when Stanford wanted to help doctors make better decisions regarding empirical treatment of patients with severe infections. Researchers considered the use of semantic networks but wanted more advanced reasoning.
The result was MYCIN, which was based on production rules. MYCIN required hundreds of IF/THEN rules (If (patient has X and Y)) check conditions and answer questions. MYCIN would then make a recommendation, some of which were human curated. MYCIN made an “enormous splash” and influenced lots of similar systems following reasoning and rule-based chains. An NIH article from 1979 pointed that symbolic AI was seen as the future, and a 1984 Newsweek article on AI stated, “It’s Here!”
There was lots of excitement, but systems based on a soup of rules are not maintainable, and begged the question of whether semantic networks (KGs, essentially) support any kind of inference beyond complex lookups?
In 1980, Allen Newell’s AAAI Presidential Address posed some premises and questions. Knowledge is what an observer attributes to an agent to allow the observer to all that agent intelligent. Knowledge is not something you put into rules or a graph; it’s behavior attributed to an agent.
The result was a change in thinking about knowledge as a competency for problem-solving. We never “see” knowledge or write it down, we can never know what an agent knows, we can only attribute knowledge to an agent that appears to have goals, select actions seemingly rationally.
Knowledge-level analysis offers the ability to understand intelligent behavior in terms of actions, goals, and how actions are selected to achieve goals. It requires a language for talking about goals and actions. We all understand what it means to have a graph, but what does this tell us about what an agent knows or selects as an action?
Knowledge representation is like a musical score; it’s not music, it’s a representation. You have to apply a process to the symbols to make music. Analogously, we can represent information in a graph, but we need to apply a process (that can perform inference) or we won’t have the experience of “knowledge”.
If we can apply heuristic classification, we can look at knowledge as rules but also behaviors. Feature abstraction allows us to generate abstraction about the data and think about things at the knowledge level. If we were to recreate MYCIN now, it would be based on an ontology of diseases, patient characteristics, drugs, etc. to reason about what kind of drugs to apply in a given situation.
In the 1990s, the knowledge-based systems community had vast plans for libraries of reusable problem-solvers for classification, fault diagnosis, constraint satisfaction, planning, design, scheduling, sequence alignment, etc. Tasks are solved by Problem Solving Methods (PSMs) which might entail subtasks. This was construed as a developing decomposition hierarchy of subtasks and PSMs that could address subtasks.
In the 2000s, the Semantic Web was sold on the promise of problem-solving at Web scale. The idea was that personalized agents could book flights, shop for clothes, manage our smart homes, update our medical records, and on and on. It was supposed to grow from the grassroots, like the Web. However, the world kept shifting. Knowledge-based systems communities jumped on the Semantic Web, and the Semantic Web community began to downplay the agents and ontologies in favor of lLinked Data. Linked Data morphed into KGs and now we are back full-circle to a world of KGs.
KGs are now not all that different from the 1970s Semantic Networks and we are ignoring four decades of systems for representation of agents, with more emphasis on ontology and representation. Today’s KGs do wondrous things, but we are not yet taking on problem-solving the way systems from the 70s-90s were doing.
After 50 years, we’ve come full circle and are back to knowledge as graphs. The good news is that graphs are enormous and much more interesting. The disappointing news is that by themselves they still don’t do anything.
Musen closed by discussing knowledge as a behavior. Graphs must be capable of behavior interpretation to reach goals. Classes and instances relate to each other and we understand the edges of nodes, but we need to think of graphs as generating behaviors. We need to bring back the graphical representation of knowledge but also bring back problem-solving and reasoning.
Guha started by explaining that data powers everything: policy, journalism, health, science, etc. How do we make it easier? The problem is not a shortage of data. We have loads of data on demographics, economics, health, climate, genomics, but they are in too many formats and schemas.
The current model for using data is to forage for data sources, track down assumptions, clean it up, compile the data sources, figure out storage, etc. The problem is that there are high upfront costs, sparse ecosystems, and few tools to address these issues. The situation is analogous to satellite imagery in 2004. LANDSAT had tons of images up on the web, but it was impractical for most people until Google Maps made this very easy to use.
At Data Commons, they want to do the same for data by being able to search, download, join, clean, and normalize data just by asking Google. So, they started by taking a wide range of data sources and connecting them to a single aggregated knowledge graph. They downloaded, cleaned up, normalized, and applied a vocabulary. The problem was the data sets were far too big to download, so they provided access via APIs to build applications on. Collections of datasets still have to be found, cleaned, joined, normalized, etc. but Data Common is a single KG built by cleaning, normalizing, and joining datasets.
The first version, v 0.9, is about people and places. It includes many data sources and APIs in REST, Python, SPARQL, SQL, and Google Sheets to build applications. There are applications for four categories of users: researchers, students, journalists, and Google users. Data Commons can answer questions such as the prevalence of obesity in 500 U.S. Cities from data aggregated across many sources which are already in the platform, allowing one to see correlations. There is a lot of data already aggregated for use with 5K attributes/variables normalized and cleaned up for querying, visualization, etc.
The Biomedical Data Commons is the second area built in Data Commons including about 30 data sources. It can be searched, allow data to be downloaded for analysis, and queried via a SPARQL endpoint.
There are still representational issues remaining, such as
- Representing metadata for the web
- Structured data on the web
- Issues with representing time, geography, etc.
- Problem of long predicates, and
- Real systems need more expressiveness
They are working on Data Commons for economics, energy, COVID, and more.
Guha’s final thoughts were around vocabulary creep and the fact that different sources have many tens of thousands of schema terms requiring that compositionality of natural language be brought to knowledge representation. In addition, the problems of old AI have still not been solved, such as classes of inferences are limited and makes it difficult to ask temporal questions