Select Page

What is Deep Text Analytics?

Extract Valuable Information from Unstructured Data

Deep Text Analytics (DTA) is a method for extracting information from unstructured data. It combines machine learning and NLP techniques with knowledge graphs to enable algorithms to better analyze text by not only processing words, but understanding the underlying concepts and their context.

Gartner predicts that “by 2024, companies using graphs and semantic approaches for natural language technology (NLT) projects will have 75% less AI technical debt than those that don’t.”

Gartner, Inc: ‘Predicts 2020: Artificial Intelligence — the Road to Production’ (Anthony Mullen et al, December 2019)

The Challenge Posed by Text

Human language ability is unique. It is at the heart of our social and business interactions. Language enables us to communicate, cooperate, negotiate and interact with each other. It is the tool we use to capture our experiences and share our knowledge. Our laws, regulations, messages, disputes and business transactions are all documented in text form.

The amount of text produced by humanity is growing exponentially, and much of it is valuable business information. Therefore, the ability to process large amounts of text, gain insight from it, organize it, connect it, understand it, and use it to answer questions is of paramount importance. 

This is where our approach to deep text analysis comes in. Using semantic knowledge graphs, natural language processing and machine learning, we can effectively extract and classify information from large amounts of data. This method overcomes many of the limitations of other approaches that have similar goals but remain imprecise due to a lack of text context processing capabilities.

The Limitations of Text Mining 

You have probably heard of text mining, the process of extracting information from text using natural language processing (NLP) techniques. With text mining, typically large collections of documents are processed to discover new and relevant information or to help answer specific questions. The aim is to identify, extract and structure facts, relationships and assertions so that they can be further processed as text data by machines.

The structured data generated by text mining can be reused in semantic data fabrics, data catalogs or as business intelligence dashboards for data analysis.

Missing context: However, most text mining methods are primarily based on statistical procedures that lack any background knowledge that could be conveyed by ontologies, taxonomies or knowledge graphs. In simple terms, this means that the system searches for words in texts and counts how often they occur, how relevant the words are therefore (e.g., by calculating the so-called TFIDF), and how these words occur in the neighborhood of other words. This approach has many limitations, the largest of which is that machines cannot understand the semantic context in which the words are embedded.

Ambiguity: This leads to major problems when it comes to disambiguation, for example. If you, as a human being, read a text about cars and see the word “Jaguar”, you know that it refers to a car brand, because you understand from the context that “Jaguar” in this case must be the car and not the animal. 

Lack of standards: Another disadvantage of traditional text mining tools is that the resulting, often more structured data objects are not based on any standard and therefore cannot be easily processed together with other data streams, e.g., to be linked and matched with structured data. 

Such limitations make clear that we have to teach our AI applications how to understand words and context. That is the only way for them to determine their meaning more accurately.

Deep Text Analytics Goes Beyond Simple Text Mining

Deep Text Analytics uses knowledge graphs and semantic standards to process the context of the text to be analyzed, which can then be embedded in an even broader context. It is a very advanced methodology for automatic text understanding, based on a number of technologies that are being fused together: 

  • Text structure analysis
  • Extraction of entities from text based on knowledge graphs
  • Extraction of terms and phrases based on text corpus statistics
  • NLP techniques such as stemming or lemmatization
  • Recognition of named entities and text classification based on machine learning enhanced by semantic knowledge models
  • Optionally also the extraction of facts from text
  • Automated sense extraction of whole sentences, which is based on the extraction of data and entities and validation against a set of conditions using knowledge graphs

This gives machines access to the relevant background knowledge to interpret and classify words, sentences and even entire paragraphs more precisely. This knowledge is made available as a knowledge graph based on W3C standards to resolve language problems of natural language more precisely. This approach helps to avoid misinterpretations, a common problem e.g. with virtual assistants who often take information at face value and do not have the ability to read between the lines.

All Benefits At a Glance

Here is a list of the advantages that Deep Text Analytics brings compared to traditional text analysis methods:

Instead of developing semantic knowledge models per application, DTA relies on a knowledge graph infrastructure, and thus on more reliable and shared resources, to efficiently develop Semantic AI applications embedded in specific contexts.

It merges several disciplines like computer linguistics and semantic knowledge modelling to help computers understand human communication (for example, to create really working chatbots).
Human communication generates a large amount of unstructured data mostly hidden in textual form. Deep Text Analytics helps to resolve the ambiguity of unstructured data and makes it processable by machines.
It performs extraction and analysis tasks more precisely and transforms natural language into useful data.
The technology is used for more precise intent recognition of human communication in the context of so-called natural language understanding (NLU). The basis for this is automatic sense extraction and classification of larger text units, e.g., entire sentences.
Deep Text Analytics is text mining based on background knowledge, and by that, it is based on additional context information. This increases the precision in extracting relevant data points from unstructured content.

Two Sample Applications of Deep Text Analytics

i

Contract Intelligence

Contracts are often difficult to administrate, are filed and forgotten until a problem arises. The manual management of contracts, including the creation of new agreements and tracking the expiration of contracts, is very time-consuming. Existing contracts can also often contain risks that are difficult to detect using manual  review methods.

There are many contract intelligence solutions aiming to give better access and control over legal contracts by making them interpretable and searchable in an intelligent way. This is a perfect use case for making use of knowledge graphs supporting Deep Text Analytics to make the information within large volumes of contracts easier to find and access.

The first step in this process is to make contracts better accessible by getting them in a meaningful structure. Based on this, a first semantic analysis can be performed using the knowledge graph to determine which sections of the contract should be further analyzed by entity extraction, categorization and classification. In this step, the generic structure is then converted into a semantically meaningful structure.

Now that you know exactly which parts of the contract relate to which subjects (e.g. confidentiality, guarantees, financial conditions, etc.), an in-depth analysis of the specific subjects can be carried out, applying rules that are in line with the conditions, through tests defined on the basis of the knowledge graph. This gives you a better insight into your contracts and allows you to check the compliance of contracts along your own guidelines by automated sense extraction of entire sentences.

Intelligent Robotic Process Automation

With the introduction of robotic process automation (RPA), organizations are striving to use a noninvasive integration technology to eliminate tedious tasks so that the company’s employees can concentrate on higher-value work. However, RPA rarely uses any AI or ML techniques, but rather consolidates a large number of rule-based business process automation and batch jobs to organize them in a more intelligent way.

The next generation of RPA platforms is just around the corner, and they will contain much more AI than their predecessors, and much of it will be based on Deep Text Analytics. Therefore, RPA seems to be only a stopgap en route to intelligent automation (IA), which eventually automates higher-order tasks that in the past needed the perceptual and judgment capabilities of humans, for example:

 

  • On-boarding processes (new customers or employees)
  • Complaint and claims handling
  • Risk analysis (e.g. financial reports)
  • Optimization of helpdesk
  • Monitoring and verification of compliance
  • Due diligence processes

Learn the benefits of text mining + NLP.

Dive deeper into the PoolParty approach of deep text analytics. See how you can create recommender systems and knowledge hubs.