Graph-Based Text Mining
The most precise method to create knowledge from unstructured data
For decades, we have stored the majority of our data and knowledge in natural language. And we want machines to help us discover, reuse and utilize this knowledge. Even more, we can only cope with the sheer volume of data to be processed with the help of machines. But natural language is not made for machines. For them it can be tricky to interpret because natural language contains issues such as ambiguity in the same word, contextual knowledge which is covered between the lines and implicit knowledge which is hidden in how a fact is expressed.
Machines that understand your content
With Graph-Based Text Mining natural language documents will be deconstructed into it’s components to identify, extract and structure facts, relationships and assertions so that they can be further processed as text data by machines.
Graph-Based Text Mining uses knowledge graphs and semantic standards to process the context of the text to be analyzed, which can then be embedded in an even broader context. It is an advanced methodology for automatic text understanding, based on a number of technologies that are being fused together:
Understand syntax by grammatical processing
In Graph-Based Text Mining rule based pre-processing supports in extraction of terms and phrases following word order, punctation and grammar.
Morphology computed with NLP techniques
Stemming or lemmatization are some oft he NLP techniques applied in Graph-Based Text Mining to compute word-form and word-structure.
Semantics represented by a graph
The meaning of entities and their relation to other entities in a piece of text is extracted and reproduced in an neural format. Whereby recognition of named entities and text classification is based on graph-enhanced machine learning.
Usage context embeded into language pragmatics
The graph-based approach allow to embed the usage context already into the knowledge model. You get educated machines which are better prepared to the specifics of the pragmatic natural language application.
The value of text mining
Once the machine has understood the text you provide with service portals, reports, information or your own legacy content, you can leverage the value of text mining.
Reduce transformation costs
The preparation of content from natural language for automated processing normally requires human interpreters. Text mining can take over here and significantly reduce personnel costs and processing time.
Effective processes by non-ambiguous text understanding
If processes are to be controlled by natural language, then ambiguities and misunderstandings should be eliminated. Different interpretations of the same facts can have costly consequences in process control. Text mining ensures that unambiguous conclusions are drawn from natural language – without human interpretation blurring.
Gaining an overview and maintaining focus
Text mining helps to keep track of large volumes of content such as social media, customer support tickets, the voice of the employee and the voice of the customer and to react to those signals that are essential. Text mining is a filter that can be set to detect threats and risks.
Access to hidden knowledge by bridging terminological barriers
Graph-Based Text Mining breaks down natural language into understandable concepts instead of just readable terms. Graph-based text mining breaks down natural language into understandable concepts and not just readable terms. This makes content accessible that would otherwise remain undiscoverable for use because a different jargon or the vocabulary of a different area excludes it from search and discoverability.
Experience the major benefits of the PoolParty Extractor.
Advanced text mining using concepts
With concepts, a Sneaker is not just a sneaker, it is a Trainer, Tennis Shoe, Sportschuhe, zapatillas.
More than an annotation
Based on the concepts around it (i.e backpack, sportswear, etc.), the computer will understand that Puma refers to the sneaker brand and not puma, the animal.
Automatic classification of content
PoolParty Text Mining in a nutshell
In PoolParty Graph-Based Text-Mining is realized as a chain of interconnected API calls.
A series of documents (PDF, DOC, Powerpoint, TXT, etc.) serve as a reference for your knowledge area. The corpus is in dialog with your thesaurus and is both the source for your thesaurus creation and your reference when it comes to the specific list of terms and their occurrence in your field.
An indexed data structure of the thesaurus is the called Extraction Model. It enables the extractor to do fast concept matching over the whole text.
The thesaurus essentially contains the terms together with synonyms and alternative terms that are identified by the extractor in a given text.
The Term Extractor detects specific pieces of the text that are characterized as potential term candidates.
The Term Matcher has then the mission to match the candidates to the thesaurus model and resolve conflicting matches.
Cleaning up the results and provide them in a machine readable format for further usage.
Input text for the extractor is pre-processed. A set of rules cleans, normalizes and formats the text.