Select Page
PoolParty Lifesaver

PoolParty Lifesaver: LLMs and Knowledge Graphs: A Technological Waltz

February 22, 2024

8

All Blog posts

Whether we like it or not, we’ve been exposed to generative AI (GenAI) in one way or another. 

From LLM chatbots that give us answers on the fly to images generated from nothing but text – and then regenerated over and over again to be “more” of the originally asked prompt, ie. “show me Vienna”, “give me Vienna times 10”, “I want to see Vienna from the heavens”.

As knowledge workers and content creators, the possibilities that GenAI can provide are coupled with a lot of skepticism. We have seen enough of what it can and cannot do to understand it is not a stand-alone solution. One thing is for certain, GenAI is here to stay and we better learn which tools will help us keep up with it in ¾ time.

Vienna

 

 

 

 

 

Vienna 10x

 

 

 

 

 

Heavenly Vienna

 

 

 

 

 

Source: DeepAI

Let’s start this dance on how to ensure we get what we want from GenAI with some basic definitions of Large Language Models and Knowledge Graphs:

What is an LLM?

A large language model (LLM) refers to a type of artificial intelligence model designed to understand and generate human-like text based on the input it receives. These models are trained on vast amounts of text data and use sophisticated algorithms to learn patterns, relationships, and structures within language. Some are so large that they contain billions of parameters and have been trained on diverse text sources from the internet, books, articles, and more, allowing them to generate coherent and contextually relevant responses to a wide range of prompts or queries. LLMs can be used for a number of natural language processing (NLP) tasks such as text generation, text summarization, language translation, question answering, and more.

Some of the most recognizable LLMs on the market include:

GPT-3 (Generative Pre-trained Transformer 3), Developed by OpenAI, GPT-3 is one of the largest and most advanced language models to date. It consists of 175 billion parameters and is capable of generating highly coherent and contextually relevant text across a wide range of topics. This is the LLM that ChatGPT uses when interacting with human users.

BERT (Bidirectional Encoder Representations from Transformers), Developed by Google, BERT is a transformer-based model designed for natural language understanding tasks. It has been pre-trained on vast amounts of text data and is widely used for tasks such as text classification, named entity recognition, question answering, and more.

RoBERTa (Robustly optimized BERT approach), Developed by Facebook AI, RoBERTa is a variant of BERT that is trained using larger mini-batches, more data, and longer training times. It achieves state-of-the-art performance on various natural language understanding tasks.

What is a Knowledge Graph?

A knowledge graph represents a knowledge domain with the help of a knowledge model that is created by subject matter experts. It provides a structure and common interface for all of your data and enables the creation of smart multilateral relations throughout your databases. Representing an additional virtual data layer, the knowledge graph lies on top of your existing databases or data sets to link all your data together at scale – be it structured or unstructured. 

An enterprise knowledge graph contains business objects and topics that are linked, classified, semantically enriched, and connected to existing data and documents within an organization. As siloed data is an increasing issue in and across organizations, knowledge graphs are becoming especially helpful in creating a single source of truth for all your data.

Does that make sense? Sometimes it’s just easier to see it. The graphic below shows a simplified version of a knowledge model that connects various objects together so that you may see the logical relations between those concepts.

An example of a graph database in an easy-to-understand knowledge graph of the Mona Lisa.

Source: Let the Machines Learn, Yashu Seth 2019

So, now we know what an LLM is and what a Knowledge Graph is, how do these two things come together and why is it important that they do?

In a recent LinkedIn post entitled LLMs have revolutionized AI. Do we still need knowledge models and taxonomies, and why?, CEO and Co-founder of Semantic Web Company, Andreas Blumauer, has stated 10 arguments (we’re sure he has more) for “why state-of-the-art AI systems should be based on a hybrid architecture consisting not only of LLMs but also of semantic knowledge models such as taxonomies and ontologies.” 

These arguments include:

Organizing and Structuring Knowledge via knowledge models and taxonomies to organize knowledge in a logical structure, making it easier for humans and machines to understand complex data. Taxonomies guide LLMs in generating and retrieving information more accurately, such as classifying internal company documents independently.

Enhancing Search and Knowledge Discovery with the help of taxonomies by organizing information into categories and subcategories, making it easier to extract relevant information. They are useful in large databases and content management systems for accurate search results and content recommendations. This is the basis of Retrieval Augmented Generation (RAG).

Improving Data Interoperability and comparability of data across different systems with taxonomies and ontologies. This is essential in today’s interconnected world where data needs to flow seamlessly between different applications and services.

Promoting Ethical and Responsible AI Use as users grow concerned about the ethical implications of AI. Taxonomies and knowledge models can help ensure responsible and transparent use of LLMs and explainable AI architecture.

LLMs are not a knowledge database. Once again, LLMs are not knowledge databases and should not be used in use cases where a human cannot verify the output. However, they can complement any IT system and provide a natural language interface to expert technology.

It should be clear now how the waltz between LLMs and Knowledge Graphs should take shape. Although an LLM has the capability to be extremely powerful and contain a lot of information, it might not be exactly what you or your organization need. 

Keeping with the dancing theme, we can think of an LLM as a music festival where tens of thousands (or hundreds of billions in the LLM’s case) of participants attend. As things progress, more and more participants fill the limited space, things start to get crowded, water starts to run out, lines run rampant, and then you have some punk rockers devote a song to a rock legend and all chaos breaks loose (this is a reference to Woodstock ‘99, if you’re still reading) – the outcome is essentially a mosh pit where it’s every person for themselves. This may be a bit of an extreme picture, but essentially this is what an LLM is without bounds. 

Now, let’s compare this to a formal ball, where there are still thousands of participants, but they all (or at least most) play by the rules, wait their turn, clink their punch glasses responsibly, and waltz orderly in clean lines to ¾ time – it’s a dream for those of us who love organization. Complete order. 

So, now imagine if we could take that order of the formal ball and apply it to the music festival – wouldn’t it be the best of both worlds?

Not convinced yet? Don’t worry, we’re once again adding some Gartner insights to win you over!

In the Emerging Tech Radar: Generative AI, published by Gartner on November 16, 2023, an exploration into Generative AI (GenAI) uncovers that the connection between GenAI and grounded models, such as a Knowledge Graph, has shifted from “hype” to “grounded reality.” The main reason for this shift being the simple fact that GenAI alone has reached its “tipping point in effectiveness and accuracy.” This has opened the floodgates for embedded GenAI applications to take center stage and start organzing the waltz. 

Based on 25 emerging GenAI related technologies, Gartner has identified the following four overarching themes:

Model-Related Innovations are at the Core of GenAI Offerings with knowledge graphs expected to improve performance of GenAI-enabled applications. LLMs have reached a tipping point in accuracy and effectiveness, attracting large investments and R&D development, but hallucinations are a concern. Model Performance and AI Safety User is crucial for responsible GenAI management, with a focus on hallucination management to improve model performance. Build- and Data-Related Advancements discusses the critical steps involved in building a GenAI model. Synthetic data is expected to play a critical role in the near term. The Next Generation of AI-Enabled Applications such as GenAI-enabled virtual assistants, workflow tools, and advanced simulation techniques, are expected to emerge in the next three years, but some may have negative consequences for society.

In Gartner’s classic bullseye graph, we can see the above themes placed on four quadrants with the 25 emerging GenAI technologies as data points set in a time range and varying masses. For simplicity, we’ll be focusing on the Model Performance and AI Safety User and the Build- and Data-Related Advancements quadrants as they relate directly to LLMs and Knowledge Graphs.

Open-Source LLMs (1 to 3 Years)

What is it?

These LLMs are open-source deep-learning models that allow anyone to access, use, modify, and distribute the source code without restriction (think of the previously mentioned music festival analogy).

Why is it important?

Open-source LLMs offer better customization, privacy, and security controls, collaborative development, model transparency, and reduced vendor lock-in. They are more flexible to customize and provide measures of transparency compared to proprietary LLMs. This allows for continuous development and makes applications harder to imitate by competitors. That being said, this also makes Open-source LLMs more volatile, which is why it is suggested to use them in conjunction with a model that is focused on AI safety and accuracy – as we have seen in Linked Open Data (LOD) with great success.

Hallucination Management (1 to 3 Years)

What is it?

Hallucinations in LLMs can be caused by various factors including training data quality, insufficient training, an overly complex or simplified model, inadequate prompts, and others. There are two approaches to managing hallucinations:

      • “After-the-fact mechanisms” such as human-in-the-loop or prompt engineering
      • The vendor of the model can address the root cause in the model or training

Why is it important?

LLM-based enterprise search engines and knowledge mining are delivering good results in terms of productivity gains, improved customer experience, faster decision-making, and cost savings. However, accurate outputs are critical to maintain a high level of service experience. Around 25% of summaries contain hallucinations, and mitigation strategies include:

      • Prompt Engineering Tools
      • Retrieval Augmented Generation (RAG)
      • User-in-the-Loop Workflows

It is important to note that the selection of a mitigation strategy depends on the use case and root cause of the hallucination.

Prompt Engineering Tools (3 to 6 Years)

What is it?

Prompt engineering is the process of providing inputs to GenAI models to limit their responses without updating weights. It is also known as in-context learning and uses examples to guide the model.

Why is it important?

In the future, prompt engineering tools will be used across multiple industries. Alternative options like building a model from scratch or fine-tuning will be more complex and expensive. Domain-specific models could mitigate some of the need for prompt engineering.

Retrieval Augmented Generation – RAG (1 to 3 Years)

What is it?

RAG is an architecture pattern that combines search and generative capabilities for content consumption. Retrieval is used to inform and augment content of prompts in LLM generative process. Generative output provides sources and citations, and the model is informed by the most recent information.

Why is it important?

The adoption of RAG technology for enterprise content consumption is important due to learning curves and the ability to fund efforts to improve knowledge activation and retrieval. Technology vendors perfect RAG elements for productivity tools, while custom-built applications are needed for content consumption that needs aggregation of multiple knowledge bases. Productivity gains measurement is challenging for enterprises as knowledge workers spend 20-30% of their time looking for information.

User-in-the-Loop (6 to 8 Years)

What is it?

User-in-the-Loop (UITL) AI requires users to be looped into any stage of the development pipeline. Traffic flows both ways, and finding a “lingua franca” between users and AI will take some time. UITL solutions ensure sustained model effectiveness and help in the responsible use of AI.

Why is it important?

UITL technology will have a high impact across markets and industries, making it critical to limit failure when implementing AI solutions. It has already seen adoption in training virtual customer assistants and shaping chatbot behavior. The potential for UITL is vast, but some iterations include improving the accuracy of datasets and providing feedback via physical demonstration or manipulation.

Knowledge Graphs (Now)

What is it?

As previously stated, but here’s a Gartner spin on it – a Knowledge Graph is a machine-readable data structure that describes the relationship between heterogeneous data via a network of nodes and links. It consists of ontology, taxonomy, vocabulary, graph databases, semantic-mapping tools, and inferencing to discover new relations between existing nodes.

Why is it important?

Knowledge Graphs are crucial for GenAI-enabled applications as they capture complex relationships and improve performance. They are used in search and recommendation engines, data and analytics engines, enterprise decision and knowledge management solutions, and virtual assistants. Knowledge Graphs drive business impact in various settings including digital workplace, automation, machine learning, investigative analysis, digital commerce, and data management.

By 2027, foundation models will underpin 70% of natural language processing (NLP) use cases, up from less than 5% in 2022.
Gartner, 2023

With the speed at which GenAI is evolving, organizations that are hoping to benefit from them will need a strong base in place to hone everything the LLM can provide. This is why it is imperative that organizations take the steps now to implement Knowledge Graphs within their ecosystems if they have not already done so.

Knowledge models represent a means to guide and shape the creative outputs of AI.
Blumauer, 2024

Are we waltzing with LLMs and Knowledge Graphs in perfect unison yet? Well, not quite, this is something that will take a bit of practice, but we are hopeful. With LLMs being the free-spirited wealth of information and Knowledge Graphs helping keep that information within the lines, there is a bright future for the two working together. 

Want to explore how users can harness the powers of GenAI?
Register now for the PoolParty Summit 2024 where over 20 speakers will discuss how to unlock the full potential of LLMs and Knowledge Graphs with the PoolParty Semantic Suite and various other partner technologies. 

We hope you’ve enjoyed this installment of the PoolParty Lifesaver series emphasizing the importance of Knowledge Graphs in the ever-changing world of LLMs. Keep your eyes peeled for these and other emerging technologies in the media, in your inbox, and around the  office water cooler – you never know when they will come handy. 

If you liked this blog, and want to keep up to date, click the join mailing list button below and you’ll get a notification right in your inbox when a new installment of the PoolParty Lifesaver is available. 

Sources:

Emerging Tech Impact Radar: Generative AI, Gartner 2023
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

LLMs have revolutionized AI. Do we still need knowledge models and taxonomies, and why? (LinkedIn Post)
Andreas Blumauer, February 2024

The PoolParty Lifesaver

Blog Series

Subscribe now to get fresh tech insights including:

  • Emerging market trends in semantic technology
  • How to unlock the potential of semantic technology to actualize business benefits
  • Dismantling and understanding buzzwords and topics
  • and more…

You may also like these blogs …