Enterprise Taxonomy Management
Clarifying Ontology and Taxonomy Terminology
Knowledge Organization Systems (KOS) is a generic term that embraces taxonomies, thesauri, ontologies, classification schemes, name authorities, topic maps, and other structured terminologies.
Controlled Vocabularies is a term often used synonymously with taxonomies and thesauri. In a controlled vocabulary, every entity must be disambiguated with a unique label. Controlled vocabularies are also characterized by formal policies and procedures governing the curation of the vocabulary.
Taxonomies are sets of specific concepts, classes, and individuals enumerated within an ontology. The entities in a taxonomy may be ordered within a hierarchical structure, may also contain associative relationships, or they may comprise unstructured lists.
Schemas are the set of class, property, and relationship types that define the structure and the semantic model for an ontology.
Ontologies are a form of KOS comprising a Schema that defines all the class, property, and relationship types used in a KOS, plus a Taxonomy that contains all the specific named instances of concepts, classes, and individuals. Throughout this guide, we generally use the word “taxonomy” to describe the organization of concepts, classes, and individuals whether or not that taxonomy lives within an ontology KOS or as a standalone information architecture.
Schemes are discrete ontologies (schema and taxonomy) compartmentalized by knowledge domain (e.g., topics, products, markets, brands, etc.), or usage and ownership (e.g., corporate, department X, division Y, etc.). Discrete schemes can standalone or be interconnected hierarchically and associatively to form a cohesive multi-domain KOS.
Transitioning from Taxonomies to Ontologies can be a steep learning curve, with complex data models and query languages to master. When selecting a KOS management tool, look for an intuitive user interface, re-usable templates, and configurable schemes to speed up development by simplifying the complexity of working with RDF and Linked Data.
This can help to:
- Start building RDF taxonomies, ontologies, and graphs in minutes with re-usable templates and schemes
- Reduce project costs and fast-track deliverables with plug-and-play libraries of public domain ontologies and taxonomies
- Build smarter search and discovery applications that leverage the business logic and semantics of well-defined schema
- Simplify systems integrations by adopting industry standard data models and portable data formats
Ontologies and Taxonomies are the foundation for building smart search and discovery applications. Semantic schema, unambiguous terminology, and accurately tagged metadata enable enterprises to deliver precision search, rich browse experiences, as well as content recommendations and the discovery of inferred facts and knowledge.
Graphite is Synaptica’s solution for developing and managing enterprise taxonomies and ontologies. Graphite combines taxonomy and ontology management into one seamlessly integrated user experience. Graphite Knowledge Studio is our latest solution for autocategorization of text-based enterprise content according to enterprise taxonomies.
GraphDB is a highly scalable RDF graph database engine embedded with Graphite. Together, Graphite and GraphDB provide the essential tools to develop enterprise knowledge graphs, manage controlled vocabularies and metadata, and provide data analytics and business insights.
Single Source of Truth (SSOT) is the key to successful metadata management. Centralizing and standardizing enterprise terminology involves: knowledge modelling; role-based collaboration; governance and workflow; reporting mechanisms; editorial tools to build, enrich, crosswalk, and review taxonomy schemes; and APIs and connectors to publish controlled vocabularies to content, metadata, and search systems.
Annotation: tagging content with named entities from taxonomies.
Aboutness: identifying the few highest relevancy-ranked concepts that best describe the overall ‘aboutness’ of a document.
Content: concept labels may match words and phrases found in documents but not match their semantic context – to eliminate false matches and improve tagging precision taxonomists need to be able to add positive and negative context rules.
Corpora: any set of content/documents used for a tagging process.
Inline Tagging: identifying the many concepts mentioned anywhere within the body of a document.
Explainability: the ease with which a taxonomist can identify why a concept was tagged to a document, and refine the taxonomy-tagging rules if required.
Natural language processing (NLP) techniques or tuned queries to large language models (LLMs). These techniques form text analytics services that power machine annotation of content. Autocategorization of non-text content may rely upon computer vision techniques, audio signal processing, or other machine learning models.
Knowledge Graphs: Enterprise Knowledge Graphs (EKGs) are graph representations of the knowledge domains of a specific enterprise. While EKGs encompass KOSs (ontologies and taxonomies), they typically also contain reference data sets, enterprise data, and metadata which is linked to and described by the KOS.
Semantic Web & W3C Models
The World Wide Web Consortium (W3C) maintains a large set of standards and guidelines governing basic and advanced semantic web principles. The W3C’s Semantic Web technology stack includes RDF, OWL, SPARQL, and the SKOS standard.
Linked Data The term Linked Data refers to a set of best practices for publishing structured data on the Web.
RDF A W3C standard data model for the description and exchange of graph data, which facilitates data exchange even across schemas. RDF data can be expressed in multiple formats or syntaxes (e.g., XML, OWL, Turtle). RDF – Semantic Web Standards (w3.org)
SKOS A common RDF format for storing and modeling taxonomies; ontology schema for creating taxonomies in RDF format: SKOS – Semantic Web Standards (w3.org)
SPARQL Standard query language for RDF graphs.
Many KOS use a mixture of classes and properties from different open data sources, the most common being OWL and SKOS. If a KOS is primarily based on OWL, then the whole KOS (schema and taxonomy) will typically be referred to as an ontology. If a KOS is primarily based on SKOS the KOS will typically be referred to as a taxonomy. Appendix 2 provides further detail of the distinctions across models.
Standards are documented, established procedures and practices promoting consistency and interoperability for specific tasks. The common standards governing the construction and governance of controlled vocabularies include:
- ANSI/NISO Z39.19-2005 (R2010)
- ISO 25964-1:2011
Enterprises benefit from reviewing relevant industry standards describing the principles of controlled vocabulary management before preparing internal editorial guidelines.
Download the full Synaptica Guide to Developing Enterprise Ontologies, Taxonomies, and Knowledge Graphs.
This Guide covers Governance, Development, how to transition from taxonomies to ontologies and includes the Ontotext 10 step method to creating Knowledge Graphs.