Select Page

Advancing Pharmaceutical Research with PoolParty’s Robust Graph-Based Recommender Systems

Boehringer Ingelheim Success Story

Our customer

As one of the largest pharmaceutical companies in the world, Boehringer Ingelheim is one organization that must constantly innovate in order to keep up with rapidly changing health demands.

Driven by research and development, the company produces innovative drugs for common health problems relating to cardiovascular diseases, oncology, respiratory diseases, viruses, etc. Along with human health, Boehringer Ingelheim also provides preventative vaccines and drugs for animals in livestock industries as well as domestic pets.

The challenge

With constant health crises and improvements always looming over their work environment, Boehringer Ingelheim needs to work from systems that can be just as agile as their work requires. Overwhelmingly complex data and unintelligent search platforms are major pain points for organizations in the pharmaceutical industry, and switching to smarter technology is the pain reliever. Enter PoolParty Semantic Suite.

Transforming R&D with linked data and auto classification

As a company who relies heavily on research to innovate their work, Boehringer Ingelheim spent countless hours crawling the web to collect information pertaining to health issues. Their researchers were stuck with collecting, annotating, and analyzing content pulled from millions of sources on the web — ultimately proving to be quite the tiresome task.

Since they were wasting time and effort manually editing their databases, Boehringer Ingelheim hoped to automate the process with semantic technologies. Combining their web-crawling methods with semantic linked data, taxonomy management, auto classification (automated tagging), and graph-based recommender systems proved to be the solution.

With PoolParty Semantic Suite, Boehringer Ingelheim could easily integrate these semantic capabilities into their existing workflow. They could still crawl the web as they normally did; however, they did not have to do so on a manual, weekly basis. Using advanced language processing techniques, they could continuously extract meaningful text from the websites (e.g. important names, figures, etc.) and add them to their corpus for automated tagging and classification. In PoolParty, content that was pulled from the web was added to Boehringer’s corpus along with content from notable databases such as DBpedia, MeSH, etc. As a result, a strong vocabulary specific to the health domain could be built to annotate and sort texts into categories via a taxonomy, ultimately cleaning their databases and content management systems (CMS).

Useful Resources

Webinar: Building recommender systems that work.

See how a recommender system works in a front-end application.

Interested in learning how you can benefit? Talk to an expert!

Enriching Boehringer’s CMS with a graph-based recommender system

Boehringer’s newly organized databases and CMS could be used to build powerful recommender systems. While many companies are accustomed to standard search capabilities for their customer-facing sites or their internal document drives, these standard searches often do not support complex queries. This means that the results that users receive are often limited based on a small range of filters or specific keywords.

During the internal research process, Boehringer’s employees struggled to look for documents that already existed in their CMS because the topics of these documents could only be expressed in one or two keywords, though the content of these documents were far more complex.

Boehringer Ingelheim

“You may be looking for documents that apply to a topic you have in mind, but the topic cannot be expressed in only one keyword. Search engines, however, are designed in such a way that we have to break down this topic in 2-3 keywords in the hope that these terms appear in documents. In web search this may work partially, but in companies this approach is not effective, because often these exact terms are not part of the documents, which could be relevant.”

If for example, a researcher was working on a study for a newly discovered disease and they found that a particular gene was connected to that disease, all the previous documents about that gene would have no mention of the disease and could thus not be searchable. Though related topics, the standard search could not recall the documents about the gene because it relies on keywords that are not present in the text instead of implicit context or relationships. Without a taxonomy or tags that define and connect related concepts (i.e. this gene is connected to this gene mutation that is connected to this disorder and finally this disease), the researcher would have to manually make these assumptions themselves and continuously search for different keywords until the documents about the gene turn up.

Semantic graph-based recommender systems are the powerful alternative to standard search for their ability to suggest smarter results based on the user’s interactions with a platform and understanding of context and meaning.

Accessing “next steps” or further reading with intelligent concept recommendations

In PoolParty Semantic Suite, users can build graph-based recommender systems through every step via different tools. The recommender system begins with defining concepts and labels in the Taxonomy & Thesaurus Server and identifying relationships between the specified concept types in the Ontology Server. Terms and concepts can be text mined using the Entity Extractor to further enrich the content, while tags are added to describe the content and make it more readable for the recommender system.

In Boehringer’s CMS, they used PoolParty’s auto classification tools to tag internal documents similar to how you would in a library database or index. Certain descriptors were bundled together with different keywords and topics, so that any user could search for one disease or drug, and could even be recommended documents that don’t have explicit mentions of those specific words because their relationships were already mapped and bundled together.

This is the most notable strength of a graph-based recommender system where a standard keyword search cannot compete. A keyword search is only as strong as the exact words a user enters in the search field, and thus will typically only retrieve results that are obvious. If a user searches for “heart rate,” they will only be given results that explicitly talk about heart rate. With a graph-based recommender system, the user gets the obvious results as well as intelligent “further reading” suggestions. I.e. you type in the words “heart rate” and get documents also relating to heart diseases, abnormalities, etc. relating to heart rate; in this case, the recommender system understands that one thing affects the other.

In order for Boehringer to achieve this, whenever a document was tagged in the CMS, the result of the tagging was not only written into the CMS system, but also into the graph database (RDF database) to establish a network of sorts for all the documents. In other words, you search for information on the X Chromosome and get documents relating to gender as well as X Chromosome related disorders and diseases that you may not have thought of, but the machine had already been trained to make that relation with the tagging and knowledge graph.

Since every building block of the graph-based recommender system could be constructed in PoolParty Semantic Suite, Boehringer could have a recommender system that is highly tailored to their needs and use cases for the pharmaceutical industry.

Today, Boehringer’s graph-based recommender system is being used to deliver precise and helpful document search. In an industry where time and precision matters most, Boehringer has been able to speed up their crucial R&D processes so that they can safely develop drugs and solutions with more accuracy and agility.

Want to take a glance at another Boehringer Ingelheim use case? Download the success story to see how Boehringer used PoolParty to integrate and enrich their taxonomies with global research data.

Read about the use case