Graph-Based Text Mining
The most precise method to create knowledge from unstructured data
About 80 to 90 percent of the information companies generate is extremely diverse and unstructured—stored in text files, e-mails or similar documents, what makes it difficult to search and analyze. Without intelligent technologies like graph-based text mining, companies often find it very difficult to exploit their data because it is simply too time-consuming to extract relevant knowledge from unstructured information.
PoolParty’s graph-based text mining uses knowledge graphs and semantic standards to process the context of the text to be analyzed, which can then be embedded in an even broader context. It combines machine learning and NLP techniques with knowledge graphs to enable algorithms to better analyze text by not only processing words, but understanding the underlying concepts and their context.
Forrester says that
“The requirement to digitize and automate business processes, especially document-based processes, will continue to be a high priority.“
The Document-Oriented Text Analytics Platforms Landscape, Q1 2022, Forrester Research Inc, (Boris Evelson et al., March 11 2022)
The biggest challenges for organizations not using graph-based text mining.
Content is often made up of natural language which can be tricky to interpret because it contains issues such as ambiguity in the same word meaning a different thing, i.e. apple like the fruit and Apple like the tech company. This leads machines to misinterpret the meaning, a common problem e.g. with virtual assistants who often take information at face value and do not have the ability to read between the lines.
Lack of background knowledge
Since organizations produce significant amounts of unstructured data, it is prone to missing knowledge linkages and understanding. Without context, many words are floating around a database seemingly unconnected because there is no clear information there to link the words together.
Using concepts and a thesaurus, the PoolParty Extractor serves as a premium text mining tool.
To solve the problems associated with too much text and just as much language ambiguity, organizations can use a text mining tool that employs machine learning algorithms and natural language processing techniques.
The PoolParty Extractor stands out among other text mining tools because it is paired with a taxonomy, providing a hierarchical structure to the extracted concepts. The concepts can then be used for automated semantic concept tagging, which significantly improves the search capabilities of a website, CMS, DXP, etc.
For more information about specific features and benefits about the PoolParty Extractor, refer to our product page >
Experience the major benefits of the PoolParty Extractor.
Advanced text mining using concepts
The PoolParty Extractor can extract meaningful phrases and entities as concepts instead of simple terms. Unique to semantic technologies, a concept allows a user to “package” a word together with all its synonyms, alternative labels, and multilingual labels.
With concepts, a Sneaker is not just a sneaker, it is a Trainer, Tennis Shoe, Sportschuhe, zapatillas.
More than an annotation
Unlike other extractors on the market, a principal advantage to PoolParty is that extracted concepts are connected to a thesaurus (taxonomy). The hierarchical structure of the taxonomy notes the “place” of a concept in relation to other concepts, in other words, it contextualizes them. By leveraging this structure, users can overcome language discrepancies that might otherwise be there.
Based on the concepts around it (i.e backpack, sportswear, etc.), the computer will understand that Puma refers to the sneaker brand and not puma, the animal.
Typically, graph-based knowledge extraction is executed against rules expressed via SHACL language. Based on this approach, complex constraints and relevant relations between business objects can be formulated and used to extract and filter out important paragraphs from large text documents.
Automatic classification of content
Using machine learning algorithms and the semantic knowledge model, the PoolParty Extractor can automatically classify content into its correct knowledge domain. With a small set of training data, you can have an extractor that is dedicated to each of your sectors (legal vs. HR vs. marketing) so that when a new document is connected to the thesaurus, it will be sorted to the correct domain. Even if your document doesn’t explicitly say it, concepts will be classified into the appropriate classes so that tagging the content is more precise.