Select Page

Taxonomies for Content Management

February 13, 2024

Angela DaSilva

Angela DaSilva

Content Strategist

8

All Blog posts

In the business world, taxonomies have always played a crucial role in organizing and making sense of vast amounts of data. These days, taxonomies are prized for what they can do for content management – in a nutshell, taxonomies categorize content assets into a hierarchical framework, allowing documents to be more precisely searched for.

What is a taxonomy?

In simple terms, a taxonomy is a method of organizing similar content into relevant groups. It is a way of classifying things and is commonly used in various fields, including biology, where it is used to organize the animal kingdom. For instance, mammals, birds, and reptiles are all different categories within the animal kingdom, and within each category, there are further subcategories. This hierarchical structure of classification is what makes a taxonomy so effective in organizing complex systems.

A complex system that we all experience in the office day-to-day is a content management system.

The typical content management setup

Content management often looks similar across organizations, regardless of their industry. A company usually purchases some sort of platform or drive that employees can use to store their documents, each individual department organizes those documents in a way that best fits the team’s needs, and at the individual employee level, even more folders are created.

Though a company/team may try to set up some sort of standard to keep track of their assets, it is often the case that people just stick to calling documents their own thing (I think many of us are guilty of having files called “image (1)” or “export (2)”). While this is certainly not best practice, individuals can get away with these broad naming conventions because they either created or downloaded the file so they know it exists and they know where to find it. 

In a company, something like “image (1)” has no chance of being found by fellow teammates. Colleagues may know that a specific document exists but unless they know the exact name of the document, they enter related keywords into the drive’s search field and dig around until the document surfaces. 

In short, a content management system is great for storage, but less great for managing and creating content. Unfortunately, these platforms are at the mercy of how well the users organize them and the strength of the platform’s search engine.

Reasons to use a taxonomy for content management

Taxonomy plays a key role in content management by supporting the following activities:

Standardizing different labelling conventions

One of the significant challenges in content management is the use of different terminology across departments and roles. Taxonomies tackle this issue by bundling synonymous terms together under one concept.

In other words, if one employee refers to a financial report as Quarter Two Expenses and the other calls it Q2 Expenses, they will be able to find the report because the document has been tagged with “quarter two” and “Q2.” Though the taxonomy is a controlled vocabulary that is organized by information professionals who set the preferred labels of a concept (in this case, Quarter Two), the varying language of users is considered with the synonyms. 

In this way, the metadata surrounding the documents follows a company standard and automatically expands to include terms that are synonymous with other parts of the organization – making data more accessible and findable across all departments throughout the organization.

This infographic illustrates the general workflow of auto classification.

Document classification

Through the use of an entity extractor tool, keywords and labels are extracted from documents that are synced to the taxonomy. These tags can be automatically sorted into their corresponding classes and concept schemes in the taxonomy through predefined rules that have been set up in the thesaurus structure, or refined after manual review. 

Using machine learning algorithms and the semantic knowledge model, the PoolParty Extractor can automatically classify content into its correct knowledge domain. With a small set of training data, you can have an extractor that is dedicated to each of your sectors (legal vs. HR vs. marketing) so that when a new document is connected to the thesaurus, it will be sorted to the correct domain. Even if your document doesn’t explicitly say it, concepts will be classified into the appropriate classes so that tagging the content is more precise.

This helps organize your content according to topicality and refines the search performance.

Want to learn more about taxonomies for content management?

Check out our guide on intelligent content for methodology, specific use cases, and business value.

Extracting meaning from text

Humans understand in most cases what is meant by words that can have multiple meanings, but how can we teach machines the semantics of ambiguous words and phrases? Consider apple like the fruit and Apple like the tech company; without interference from a semantic tool, the machine may not know what a text is talking about because they’re the same word.

When a taxonomy is paired to text mining capabilities, the ability for the machine to understand the text within documents dramatically increases. Only when a machine is able to reliably annotate and disambiguate the meaning of a text in an explainable way, can it be used for further automation, including in the content management process. 

PoolParty natural language processing is based on the fundamental principle of not simply extracting terms that can be read on the surface of texts, but linking them to the concepts and entities behind them, and thus to a semantic knowledge model, a.k.a. the semantic layer. The semantic layer helps ensure that the “things” – including their relationships and attributes, such as their various names – found within text are mapped to a knowledge graph. The knowledge graph (whose concepts and relations are initially built in the underlying taxonomy) provides context to the text and helps clear up language obstacles, like in the case of disambiguation.

Intelligent search

Another essential aspect of taxonomy is its impact on search capabilities. By structuring data and using standardized terminology, taxonomy management enables semantic search, where the search engine understands the context and connections between different terms. This allows for more accurate and relevant search results, making it easier for users to find the information they need.

Semantic search uses the bundling of synonyms in a concept to ensure that a user gets all the information they seek, plus other helpful content. The taxonomy is also a perfect precursor to faceted search which helps narrow down results of a search query. As a user ticks through the different facets (or filters), they can specify the scope of the search – these facets are the classes of the taxonomy. 

Content reuse

Employees spend a lot of time planning and creating content only to use it once. Their time would be better appreciated if it could be used across various channels and formats and for different stakeholders.

Taxonomies are the first step in achieving structured knowledge and data management based on a semantic framework. Creating a taxonomy based on a standard like SKOS is a great way to make accumulated knowledge more accessible and reusable. The content which has been tagged with descriptive metadata is sorted into their appropriate taxonomy categories. The categorization of these tags help to locate documents in order to reuse existing content for future articles and/or channels.

 

Need some more information about taxonomies? Download our taxonomy white paper. 

You may also like these blogs …