Select Page

Graph-Based Text Mining

The most precise method to create knowledge from unstructured data

For decades, we have stored the majority of our data and knowledge in natural language. And we want machines to help us discover, reuse and utilize this knowledge. Even more, we can only cope with the sheer volume of data to be processed with the help of machines. But natural language is not made for machines. For them it can be tricky to interpret because natural language contains issues such as ambiguity in the same word, contextual knowledge which is covered between the lines and implicit knowledge which is hidden in how a fact is expressed. 

Machines that understand your content

With Graph-Based Text Mining natural language documents will be deconstructed into it’s components to identify, extract and structure facts, relationships and assertions so that they can be further processed as text data by machines.

Graph-Based Text Mining uses knowledge graphs and semantic standards to process the context of the text to be analyzed, which can then be embedded in an even broader context. It is an advanced methodology for automatic text understanding, based on a number of technologies that are being fused together:

Understand syntax by grammatical processing

In Graph-Based Text Mining rule based pre-processing supports in extraction of terms and phrases following word order, punctation and grammar.

Morphology computed with NLP techniques

Stemming or lemmatization are some oft he NLP techniques applied in Graph-Based Text Mining to compute word-form and word-structure.

Semantics represented by a graph

The meaning of entities and their relation to other entities in a piece of text is extracted and reproduced in an neural format.  Whereby recognition of named entities and text classification is based on graph-enhanced machine learning.

Usage context embeded into language pragmatics

The graph-based approach allow to embed the usage context already into the knowledge model. You get educated machines which are  better prepared to the specifics of the pragmatic natural language application.

Applying these technologies allows machines to deconstruct natural language and transform it into computable metadata to interpret and classify words, sentences and even entire paragraphs.

The value of text mining

Once the machine has understood the text you provide with service portals, reports, information or your own legacy content, you can leverage the value of text mining.

Reduce transformation costs

The preparation of content from natural language for automated processing normally requires human interpreters. Text mining can take over here and significantly reduce personnel costs and processing time.

Controlled automatic tagging
Categorisation of documents or paragraphs
Sentiment and sense identification

Effective processes by non-ambiguous text understanding

If processes are to be controlled by natural language, then ambiguities and misunderstandings should be eliminated. Different interpretations of the same facts can have costly consequences in process control. Text mining ensures that unambiguous conclusions are drawn from natural language – without human interpretation blurring.

Trigger-word or trigger-pharse detection
Intent identification
Fact extraction

Gaining an overview and maintaining focus

Text mining helps to keep track of large volumes of content such as social media, customer support tickets, the voice of the employee and the voice of the customer and to react to those signals that are essential. Text mining is a filter that can be set to detect threats and risks.

Sieve relevant data
Thread and risk alarming
Rating and clustering

Access to hidden knowledge by bridging terminological barriers

Graph-Based Text Mining breaks down natural language into understandable concepts instead of just readable terms. Graph-based text mining breaks down natural language into understandable concepts and not just readable terms. This makes content accessible that would otherwise remain undiscoverable for use because a different jargon or the vocabulary of a different area excludes it from search and discoverability.

Detection of synonyms and variants 
Entity matching

For more information about specific features and benefits about the PoolParty Extractor, refer to our product page >

Experience the major benefits of the PoolParty Extractor.

Advanced text mining using concepts

The PoolParty Extractor can extract meaningful phrases and entities as concepts instead of simple terms. Unique to semantic technologies, a concept allows a user to “package” a word together with all its synonyms, alternative labels, and multilingual labels.

With concepts, a Sneaker is not just a sneaker, it is a Trainer, Tennis Shoe, Sportschuhe, zapatillas.

More than an annotation

Unlike other extractors on the market, a principal advantage to PoolParty is that extracted concepts are connected to a thesaurus (taxonomy). The hierarchical structure of the taxonomy notes the “place” of a concept in relation to other concepts, in other words, it contextualizes them. By leveraging this structure, users can overcome language discrepancies that might otherwise be there.

Based on the concepts around it (i.e backpack, sportswear, etc.), the computer will understand that Puma refers to the sneaker brand and not puma, the animal.

Document intelligence

Typically, graph-based knowledge extraction is executed against rules expressed via SHACL language. Based on this approach, complex constraints and relevant relations between business objects can be formulated and used to extract and filter out important paragraphs from large text documents.

Automatic classification of content

Using machine learning algorithms and the semantic knowledge model, the PoolParty Extractor can automatically classify content into its correct knowledge domain. With a small set of training data, you can have an extractor that is dedicated to each of your sectors (legal vs. HR vs. marketing) so that when a new document is connected to the thesaurus, it will be sorted to the correct domain. Even if your document doesn’t explicitly say it, concepts will be classified into the appropriate classes so that tagging the content is more precise.

Useful Resources

HR Recommender Demo:

Connect employees to others and their relevant projects.

Try it free

User Manual:

Read our Help documentation to see how the Extractors works.

Read more

Named Entity Recognition Demo:

Automatically extract concepts and terms from text.

Try it free

PoolParty Text Mining in a nutshell

In PoolParty Graph-Based Text-Mining is realized as a chain of interconnected API calls.

Corpus

A series of documents (PDF, DOC, Powerpoint, TXT, etc.) serve as a reference for your knowledge area. The corpus is in dialog with your thesaurus and is both the source for your thesaurus creation and your reference when it comes to the specific list of terms and their occurrence in your field.

Extraction Model

An indexed data structure of the thesaurus is the called Extraction Model. It enables the extractor to do fast concept matching over the whole text.

Thesaurus

The thesaurus essentially contains the terms together with synonyms and alternative terms that are identified by the extractor in a given text.

Extract Terms

The Term Extractor detects specific pieces of the text that are characterized as potential term candidates.

Match Terms

The Term Matcher has then the mission to match the candidates to the thesaurus model and resolve conflicting matches.

Annotate

Cleaning up the results and provide them in a machine readable format for further usage.

Prepare

Input text for the extractor is pre-processed. A set of rules cleans, normalizes and formats the text.

To learn more about graph-based text mining and NLP, download our free white paper.