I grew up in a small town in Connecticut where you needed to drive “everywhere.” There were no sidewalks in my neighborhood, no stores in easy walking distance, no public transportation network, no local taxi company let alone Lyft or Uber (back in those pre-smartphone days). Getting around required convincing your parents or older siblings or friends’ families to drive you somewhere (and back again).
I still remember the thrill of freedom I experienced when I finally got my driver’s license – suddenly I could go wherever I wanted, when I wanted, without having to take other people’s time, attention, or resources. Everyday routines were easier and the possibilities for adventure expanded exponentially.
25 years later I am reminded of this awesome sense of freedom as I experience a new broadening of my horizons thanks to Synaptica’s Graphite Knowledge Studio (GKS) and its ability to bootstrap enterprise taxonomies for autocategorization. Yes, I know this sounds hyperbolic, but bear with me.
How many times have you, as a taxonomist or ontologist without data science or engineering training, known the question you want to answer, if only you could access the data, which you know exists somewhere? Without sophisticated coding skills, though, you can’t query the data you need to answer your question. You may have to spend hours on YouTube tutorials teaching yourself how to script code (the equivalent of walking miles to your destination), or you have to convince a data science colleague to write the query you need (the equivalent of convincing someone else with a driver’s license to drive you somewhere). And once you’ve answered your question with data, you probably want to improve your taxonomy or ontology based on this insight. But to do this, you need to convince an engineering colleague to retrain or regenerate the model to incorporate the new design (the equivalent of arranging another ride so you can get back home again). I know I have faced this dilemma many times, and the result has either been slow progress or settling for a taxonomy or ontology that I know could be better, if only I could prioritize my request within the data science or engineering backlog. It’s a frustrating feeling to know how good things could be if only you were less constrained by resources or tooling.
When it comes to autocategorization, Graphite Knowledge Studio puts you, the taxonomist, in the driving seat, allowing you to quickly annotate your enterprise content with the concepts within your taxonomies or ontologies, inspect the results, improve the tuning of your taxonomy for the task of autocategorization, and then rerun the autocategorization. With a continuous cycle of improvement, entirely within your control, both your taxonomy and your autocategorization text analytics service get better, at speed, in real-time.
If you want to learn more, keep reading, or skip to the bottom to request a test drive.
The powers of information science and data science combined
Synaptica has developed Graphite Knowledge Studio in partnership with leading graph database and NLP developer Ontotext, bringing together the strengths of information science with the power of data science, in service of empowering taxonomists and ontologists to bootstrap their enterprise taxonomies for autocategorization of enterprise content.
As taxonomists, we are well-versed in the methods of information science. Our human-curated, human readable description and definition of content helps to ensure quality and relevance for our organizations’ business needs. But these methods are limited in their ability to scale without data science methods, which are computational and machine readable and enable inference across large data or content sets.
Graphite Knowledge Studio links the controlled vocabularies maintained by taxonomists to natural language processing (NLP) text analytics algorithms that can label content at scale and store content labels in an RDF graph database. Through its interfaces, Studio then makes transparent the work of the NLP text analytics service, so taxonomists can inspect results, write refining rules for how the taxonomies should be applied to content, and otherwise evolve the taxonomy in light of the text analytics results.
Graphite Knowledge Studio enables a simple but powerful workflow seamlessly integrating taxonomy and annotations views:
Annotation Service Configuration:
- Collect text documents for autocategorization
- Configure the taxonomies and ontologies already stored in Graphite for annotation with just a few clicks
- Assign human “gold standard” benchmarks at the document level to establish document ground-truth
Annotation Creation & Review:
- Run the annotations service
- Inspect the annotation results:
- Measure quality of human-to-machine translation through measurement of true positives, false negatives, and false positives
- Review inline annotations (i.e., the many concepts found in the document) and document-level annotations (i.e., the fewer concepts that the document is “about”) to see the algorithm at work and identify specific gaps between human and machine reading of content
- Easily toggle between annotation results and taxonomy/ontology concepts to:
- Write annotation-specific rules (context keywords or regular expressions)
- Enrich or revise existing taxonomy concepts through revised alternative labels, definitions, etc.
- Add or remove taxonomy concepts based on insight from your enterprise content
- Rerun the annotations service for real-time updates to your annotations results based on changes in taxonomy/ontology data.
Document-level and inline annotations are created from taxonomy concepts. The work of the text analytics service is transparent to the reviewing taxonomist.
Taxonomist in the driver’s seat:
It’s this experience of being the human-in-the-loop at the interface of human and machine understanding that triggered my nostalgia for those early days of driving and new-found freedom.
With just a few clicks of the mouse, I can annotate documents based on my curated taxonomies, inspect both the inline tagging and the document-level classification, enrich the metadata of my taxonomy to improve tagger performance through specific tagging-related rules, change my taxonomy design to reflect my new understanding of the content, and then run the entire process again, in a matter of minutes.
”And all through this process, no requests for data science or engineering support, no black-box algorithms I can’t understand, no time delays while I wait for model retraining or new human labelling cycles. I am alone behind the wheel, and I am loving the ride.
Across this process, I can rely on the easy reporting interface of Graphite Knowledge Studio to report on annotations quality. The reporting interface tracks multiple measures of human-to-machine accordance:
- Raw counts: True positives, False positives, False negatives
- Calculated metrics: Precision, Recall, F1 score (harmonic mean of precision and recall), Jaccard index (true positives as a percentage of all annotation results)
Using these metrics, I can measure progress in narrowing the gap between machine reading and human understanding of the content. These reporting metrics are my road signs – keeping me moving in the right direction so I can arrive at my destination.
Graphite Knowledge Studio Reporting Interface
And while I’m having all this fun, whizzing between my scheme concepts and annotations and reporting and back to taxonomy metadata and design, I am actually paving the road for further insight.
Because Graphite Knowledge Studio stores all content annotations as triples in an RDF database powered by Ontotext’s GraphDB technology, each action taken by the human-in-the-loop enrichens the enterprise knowledge graph for further business value. This allows for data visualization for better content understanding and exploration, graph-driven business analytics, and similarity matching algorithms to power recommendations for further insight.
Simple visualizations can be configured in the UI, with no coding required. Simply select the concept, property, document, project, or corpus you’d like to explore, and voila, a dynamic visualization appears.
A wider range of data visualization can be tailored to specific questions or needs with simple SPARQL queries. Further refinement of these queries is simple even for those not well-versed in SPARQL, and exploration of the data requires no coding skills at all.
Visualization of skills required for specific jobs, based on annotations of collected job descriptions
In the same way, similarity mapping can be defined through SPARQL queries, but then updated and applied through a simple UI by the taxonomist or ontologist. Recommendations powered by this similarity mapping then appear in the Graphite Knowledge Studio interface for easy review and validation.