Select Page

Graph-based Text Mining

The best approach to turn unstructured data into knowledge

About 80 to 90 percent of the information companies generate is extremely diverse and unstructured—stored in text files, e-mails or similar documents, what makes it difficult to search and analyze. Therefore, the ability to process large amounts of text, gain insight from it, organize it, connect it, understand it, and use it to answer questions is of paramount importance.

PoolParty’s graph-based text mining uses knowledge graphs and semantic standards to process the context of the text to be analyzed, which can then be embedded in an even broader context. It combines machine learning and NLP techniques with knowledge graphs to enable algorithms to better analyze text by not only processing words, but understanding the underlying concepts and their context.

This approach leads to, amongst other benefits, to extremely accurate automated tagging, which not only saves time by avoiding the manual tagging of large amounts of documents, but also improves knowledge discovery and decision making by making information easier to find and analyze.

Gartner predicts that

“by 2024, companies using graphs and semantic approaches for natural language technology (NLT) projects will have 75% less AI technical debt than those that don’t.

Gartner, Inc: ‘Predicts 2020: Artificial Intelligence — the Road to Production’ (Anthony Mullen et al, December 2019)

Useful Resources

HR Recommender Demo: connect employees, shows them relevant projects, and much more.

Named Entity Recognition Demo: automatically extract concepts and terms from text.


Case Study: Knowledge Graphs in the Banking and Insurance Sectors

Graph-based text mining is a very advanced methodology for automatic text understanding, based on a number of technologies that are being fused together: 

  • Text structure analysis
  • Extraction of entities from text based on knowledge graphs
  • Extraction of terms and phrases based on text corpus statistics
  • NLP techniques such as stemming or lemmatization
  • Recognition of named entities and text classification based on machine learning enhanced by semantic knowledge models
  • Optionally also the extraction of facts from text
  • Automated sense extraction of whole sentences, which is based on the extraction of data and entities and validation against a set of conditions using knowledge graphs

Graph-based text mining ultimately gives machines access to the relevant background knowledge to interpret and classify words, sentences and even entire paragraphs more precisely. This knowledge is made available as a knowledge graph based on W3C standards to resolve language problems of natural language more precisely. This approach helps to avoid misinterpretations, a common problem e.g. with virtual assistants who often take information at face value and do not have the ability to read between the lines.

Gartner recommends:

“Ensure that Natural Language (NL) data and metadata can be used across different applications, and that NL technology projects are designed to create a systemic consistency that allows more complex, multimodal and context-enriched experiences.”

Gartner, Inc : 2021 Strategic Roadmap for Enterprise AI: Natural Language Architecture (Anthony Mullen, et al, December 2020) 

Download the white paper