Bob Kasenchak and Ahren Lehnert, Synaptica
The second-to-last session in this series was entitled High-value use cases of knowledge graphs and featured useful talks by speakers from Intuit, JPMorgan Chase, and Wells Fargo outlining some real-world examples of knowledge graphs in the financial space. Video of this session is available here.
Knowledge Graph Use Cases @ Intuit
Intuit produces a range of financially oriented software including Quickbooks, TurboTax, and Mint. Yu’s talk focused on the use of knowledge graphs to power TurboTax, including a chatbot/virtual assistant in the application.
Yu describes Intuit as using graphs to pivot from thinking about themselves primarily as a product company to an “AI-driven expert platform” to deliver personalized experiences. He defines a knowledge graph as representing data (facts and information) and logic (conditions) into concepts as vertices and relationships between concepts as edges. Further, these graphs should be created from experience and learning as well as used for understanding and reasoning by both humans and machines. In Yu’s view, therefore, a knowledge graph is very much a data graph plus a logic graph.
TurboTax Assistant is a text-based (that is, not voice-based) chatbot designed to help people online while in the TurboTax application. The Assistant parses user queries and presents possible interpretations and answers. The answers include drill-down options to more granular questions and answers based on related and more granular topics. An AI-driven platform, the conversational UI powered by a tax knowledge graph to provide personalized information and answers to in-application questions.
The same graph (or part of the same graph) is used to represent the Tax Logic Graph, which interprets the U.S. Tax Code (which now spans some 80,000 pages of documentation) into logic to drive the tax software. Of course, since the tax code is always changing, the Tax Logic Graph requires constant updating to reflect changes in the annual software release.
So the challenge at Intuit is “to scale the development of programmatic tax logic for a highly personalized experience to get taxes done with minimal effort, high accuracy, and high confidence.” The conventional approach, as described by Yu, was to use procedural programming to translate tax form logic into code, but due to the complexity and malleability of the tax code, this approach led to “spaghetti code” which is hard to test and maintain. This approach is also essentially predicated on top-down, sequential execution of code and calculations: all inputs must be collected to perform calculations, and any implicit explainability of the process is hidden in the code as programmatic formulas, making the user experience necessarily very linear.
The goal of the Intuit “Tax Knowledge Engine” is, when using the application, at any moment and given any partial user data to be able to tell what’s missing, what’s wrong, and explain back how it’s done in order to complete a tax return.
The knowledge graph-based approach now employed at Intuit solves some of these problems. The Tax Logic Graph is based on extracting patterns found in the tax formulas into generic patterns which can be deployed as a concept. For example, the pattern “ADD [some values] to get an output” is expressed in, essentially, an ontological structure. The software then uses the patterns in a declarative way using inputs and outputs to perform calculations, expressing the entire online tax form logic as a graph of stitched-together calculations.
Yu remarks that the benefits of this “declarative programming” approach include:
- Granular, incremental composition,
- Visible calculation dependencies and data flow, and
- Built-in and intuitive explainability.
By “explainability”, Intuit means that it’s easy to analyze the graph to find (for example) a calculation node that caused a result, which makes it easy to identify processes, inputs, and results (basically by traversing the graph backwards!) and translate them to a natural language explanation for the chatbot to deliver to users. The result is a data-driven, personalized experience that minimizes questions for data collection.
In the section of his talk about building a tax knowledge graph, Yu mentioned that, even using a declarative graph and pattern structure, the process still requires human expertise (this humans-in-the-loop concept is one we’ve encountered many times in previous talks), and as such the target “developers” for the project are actually domain experts, not developers (!). The automated piece of constructing the calculation graph leverages NLP-based analyses on actual tax documents to find and match patterns to begin creating the calculation logic, which is then refined by experts. Yu emphasized the need for feedback and mechanisms to allow users to provide this feedback.
Yu reports that the results of deploying the graph-driven “TurboTax Explain Why” feature resulted in better reported problem resolution and more completed forms for users.
Yu’s talk concluded with three key takeaways:
- Knowledge graphs are a natural fit for many use cases
- Knowledge graphs can be used to model logic beyond data
- Knowledge graphs can be, in part, automatically created using pattern-matching machine learning techniques
He concluded by saying that “KGs are the core of the third era of computing.”
Applications of Knowledge Graphs
Saxena notes (as we see across all of this session’s talks) that the financial world is investing heavily in AI and semantic technologies.
For background into the data challenges he’s up against, Saxena began by noting that JP Morgan is the largest bank in the US and number five globally, and that 50% of US households have an account, card, or other service with JPMC. Financial activities and records generate an enormous amount of information: over 100 TB of data logged daily and over 320 PB of stored data.
Saxena explained that most of this data is tabular data stored in relational tables and databases, and that “traditionally” the use of graphs in the enterprise was limited to fraud and risk management and customer MDM use cases. However, knowledge graph adoption is growing significantly, driven by the buildout of new Knowledge Systems for:
- AI-driven customer care
- Event-driven KG from alternative data
- Expanded use in fraud detection and risk management
- Security – data exfiltrations looking at network traffic
Saxena then described three use cases including the background of the development of the JPMC graph.
Use Case 1: Building out the Company Knowledge Graph
JPMC developed their corporate graph by combining internal client data with third-party licensed data; this enables multiple use cases for the firm that depend on relationships between customers (parent companies, customers, suppliers especially) which is useful for identifying things like loan eligibility and risk. This structure was enhanced by identifying types of questions to be answered, such as:
- If Boeing gets in financial trouble, who are their vendors and suppliers? Which are already clients? Are any of them applying for a loan? How much of their revenue derives from Boeing? (Link traversal)
- Which startups have attracted the most influential investors? (PageRank application)
- Which company is a weak link in the airline supply industry chain? (Weak link detection)
- Which nodes are most similar to [some company]? (Graph embedding/node similarity)
- Which companies might have relationships in the future? (Who might be our/their next customer?) (Link prediction)
- Which set of investors co-invest with a high degree? (Community detection)
Use case 2: Identifying fraud clusters that grow over time
Multiple credit cards opened with the same email address or the same phone number can be identified when they add yet more accounts over time; fraud clusters grow quickly and are identified using the graph and flagged.
Use case 3: Adding customer attributes to find suspicious credit card applicants
Beginning with data attributes like name, SSN, phone, and email add additional linkages to find larger clusters (people with multiple requests for new accounts attached to the same email, for example) using the graph. This process is used to find clusters of fraud for risk detection.
Planned future applications for the graph include:
- Natural language understanding
- Questions and answers
- Recommender systems for better marketing
- Reasoning across the graph
As with many other talks, Saxena provided some lessons learned in building graphs:
- Knowledge is with the business and domain owners and not the data scientists and engineers, so you need to build tools for the domain expert, who is often non-technical
- Knowledge capture is an iterative process and not a one-time step
- Come up with 1-2 key metrics to optimize knowledge system performance
- Building good interoperability and easy feedback is key to building trust in the system
Saxena concluded by saying that “good training data beats better models; reducing training data cost is key to long-term success.”
Chair, FIBO initiative
High-Value Use Cases for Knowledge Graphs: The Case for a Semantic Data Catalog
About FIBO (the council governing which Newman is chair) he says that the “EDM Council is developing a free and open source knowledge graph model for finance that is emerging as a de facto industry standard.”
The EDM Council and OpenFIBO “offer upper-level structures as open-source standardized knowledge graph starting points which leverage information already developed and increase operability…as FIBO provides conceptual scaffolding for enterprise ontologists who wish to extend FIBO or use it as a reference model.” FIBO is already in widespread use in the industry and is still gaining adoption.
Newman began his talk by asserting that “knowledge graphs are a paradigm shift for enterprise data” and the rest of his talk outlined his case.
Large enterprises, say Newman, have major concerns with data quality and quantity. How can organizations:
- Significantly improve the accuracy of metadata?
- Reduce the tremendous overhead involved?
- Increase simplicity when searching for data?
- Get a better understanding of our data inventory?
- Reverse the rising costs of data governance?
Like Yu, above, Newman notes that there are many data management challenges faced when leveraging conventional data technologies: silos, aging data paradigms, the explosion of data, incongruent models, old databases with unclear labels and purposes, and insufficient metadata. This situation causes high costs, inefficiencies, and massive errors.
Knowledge graphs, Newman says, provide a strategic solution to mediate challenges, the benefits of which include:
- A common semantic model
- Human and machine readable data and structures
- Standardization in validating data
- Smart, reusable data quality rules
- Opportunity to create a layer of knowledge across data assets
- Opportunity to create holistic data and linkage across disparate data source
- Opportunity to realign disparate protocols between dynamic and at-rest data
- Opportunity for improved search, governance of data, better regulatory compliance, future-proof evolving challenges, risk management
- A bridge to Machine Learning
A crucial point is that as data models get more expressive, data changes from being mainly human-understandable representations of information to also being machine-understandable. In this way, Newman says, utilizing a graph will reduce costs, increase efficiency, and accelerate time to knowledge.
In a graph, versus a traditional relational database, the introduction of new concepts is reduced effort over time; so although the initial investment takes some effort (time, money, productivity) as the graph project develops its high reusability reduces costs over time for data governance and management.
Speaking more generally about the enterprise knowledge graph environment, Newman asserted that “the core, foundational building block for a knowledge graph is an ontology” and enterprise ontologies:
- Should be viewable by users
- Should be connected to enterprise data lakes and legacy data stores
- Should create a layer of knowledge over data assets
Newman described the need for an enterprise “Semantic Data Catalog” that maps source data to an ontology for search and queries, further enhanced by machine learning. This requires a “semantic curation process” to leverage relationships found in the Semantic Data Catalog to identify varieties of data assets for ingestion and harmonization. An operational enterprise knowledge graph (very similar to the description in the previous talk) needs to be able to predict new relationships, inferences, encodings, and embeddings to enable a “consumption layer” for consumers, data scientists, and analysts to ask questions and query the operational graph (which now includes the data underneath the graph).
Newman says that “A Semantic Data Catalog is an Important Capability within an Enterprise Knowledge Graph Environment” because it includes:
- A common standard reusable information model
- Expressive metadata (including privacy and information security classifications)
- An inventory of data assets/elements linked to a core information model for semantic search
- Accurate data provenance and lineage, also integrated
- Executable data quality rules integrated with the Core Information Model that can also provide metrics for monitoring data quality
- Integrated, linked, and harmonized data from disparate sources available for easy consumption
Whereas conventional business glossaries have, essentially, dictionary-type definitions, a graph-based concept model defines things (in an ontology) as nodes with attributes (which can include a dictionary-type definition) but, importantly, also includes relationships to related concepts to form a useful multidimensional view into the data. Importantly, Newman says, this can be delivered as a visual model for understanding and consumption by users.
Newman also said that “knowledge graphs will become the enterprise System of Record for Business Concepts and Definitions” because they create a layer of knowledge over physical assets. “A Knowledge graph captures tribal knowledge and creates institutional knowledge.”
We note for our readers that in their respective presentations speakers mentioned that both Intuit and JPMorgan Chase are hiring!
Next week: the final session of the Stanford class, which we will summarize and publish shortly thereafter.