Bob Kasenchak and Ahren Lehnert, Synaptica
The latest installment of the Stanford class on Knowledge Graphs was entitled “How do users interact with a Knowledge Graph?” and again featured three interesting speakers from diverse parts of the field. Video of the session is available here.
You can read the full series of blogs on the Stanford University Knowledge Graphs course here.
Search and AI Driven Analytics
Prakash’s company, ThoughtSpot, provides “search & AI-driven analytics” for enterprises to allow users to get business insights from data using a novel interface and back-end technology. [N.B. as mentioned previously, since it’s not our IP, we do not think it’s appropriate to include screenshots of presenter slides in these blogs.]
The problem to be solved, as Prakash puts it, is that “static reports and dashboards are fundamentally broken” and, while great for monitoring a situation for metrics are not useful for asking the “next questions” which are difficult to get answered as data is structured and reported in an ossified way.
To illustrate the problem, Prakash cited a study that found that a “data expert” takes about 4.8 days on average to build a report for a knowledge worker (to answer a “new” question) and, since there are far more knowledge workers than data experts, this translates into queues and backlogs: often up to a month. Knowledge workers therefore often give up on the data that they need since they can’t get it in real time (or something approximating real time).
ThoughtSpot’s solution is an interface that aggregates data from across an enterprise. The system holds everything in memory (!) and indexes data using billions of tokens across hundreds of tables, refreshed regularly and accessed via a UI that:
- Features a simple UI allowing fast and easy data access for everyone
- Is smart including automated AI-driven data discovery
- Performs quickly at enterprise scale with 2-3 millisecond response time
The interface allows users to ask semantic search-style pseudo-natural language queries; entering something like “revenue last 2 years california vs texas” immediately brings up a chart and provides drill-down options that seem to be based on frequently suggested or highest-level subcategories. The AI-driven feature, SpotIQ, displays additional curate-able data to explore.
Prakash allows that the search interface does not really allow natural language queries but rather keyword/token-based queries in a simple, intuitive language that is concise and expressive; there is demand for a natural-language-based version that is in development for a voice-command-driven interface, which he demonstrated.
Interestingly, given the focus of the course, the system is not graph-powered; it relies instead on an SQL back end. However, the talk was still relevant as it provided useful insight into building knowledge systems in general. Another key takeaway is Prakash’s insistence that investing in UI (including user research) is critical and, in the end, time- and cost-saving to do early rather than as an afterthought.
Making Sense of a Field of Research
Chen begins from the perspective of information science (as opposed to computer science) which is closer to our own. His project involves building a graph based on structured information derived from published scholarly papers including citation data, authors, and journal information to provide a browseable visual “map” of a field of research.
The system, called CiteSpace, has a website and is available to try out.
The first example Chen showed, of course, was a search for “knowledge graphs” which provided a map of the space based on keywords and text found in abstracts clustered by topic and linked to other frequently appearing keywords co-occurring in the same papers. The resulting visual interface based on what he calls “cascading citation expansion” produces essentially a network of related papers, journals, topics, and authors the user can browse to discover works.
The second example focused on COVID-19 and presented clusters of research mentioning MERS, SARS, and the novel coronavirus and how the research has shifted over time to the novel coronavirus from 2003-2020. As the research shifts and there is more information, topics are automatically generated from the publications and can be drilled into to find more specific topics and links between concepts.
This website linked above has quite a bit of information about this interesting open-source project.
Gilpin’s talk was about “Explanatory AI” and although she began with rather philosophical angle (“What is an explanation?”) the focus was on the need to be able to explain what happens when complex systems (specifically deep neural network-type AI-driven systems) fail and the need to be able to explain what happened: “When complex systems break they fail in complex ways, and we need to know why.”
Gilpin’s frame expounded on the idea that explainability is not the same as interpretability:
- Interpretability describes internals of system in a way that is understandable to humans
- Completeness describes an operation in an accurate way
- An “Explanation” needs both
She emphasized that AI systems are rules-based and have no “common sense” and provided an example of an AI that determined that a person on a bus billboard was jaywalking: “even a toddler” would know that this is nonsense. So how can we figure out where the system went wrong?
The problem is compounded since many “black box”-style machine learning applications are built on a principle: if it’s working, do more of whatever you’re doing [applying algorithms] to increase the accuracy. This leads to unexplainable decision-making that’s hard to diagnose.
Gilpin explained that, for example, self-driving cars are a huge need case: it’s hard to determine why errors (that lead to, say, crashes) happen. Essentially: deep neural nets are opaque and lack explanation capabilities; further, the “deeper” the neural network, the blacker the box is.
Obviously, to take again the example of self-driving cars, liability is a huge issue.
How can we evaluate explainability? Current metrics are fuzzy, and user-based evaluations are not always appropriate…or correct.
Gilpin’s research focuses on “explanatory anomaly detection: the idea that instead of static and post facto, architectures should be designed to be dynamic, adaptable, and self-explaining. The idea is that when failure occurs a story can be made (by synthesizing explanations) to reconcile inconsistencies (failures).
Although this set of talks was occasionally orthogonal to graphs per se the issues discussed were certainly relevant and absolutely fascinating.