Managing Inconsistencies in Data Processing for Enterprise Knowledge Graphs
Albin Ahmeti, Data & Knowledge Engineer at Semantic Web Company
Yavor Nenov, Chief Science Officer at Oxford Semantic Technologies
Robert David, CTO at Semantic Web Company
The Semantic Web provides a graph-based organisation of knowledge that has become popular in enterprises under the term enterprise knowledge graphs. It allows a lot of flexibility for modeling as well as combining data sets by linking graphs together, which has the potential to solve enterprise data heterogeneity problems in a bottom-up and flexible manner.
When processing and linking together graph data in enterprises on a large scale, ETL (Extract-Transform-Load) processes supporting Semantic Web standards are used for automation. These processes, while being able to handle heterogeneity, have to consider a lot of different complex cases and therefore issues regarding inconsistencies and incompleteness in the data can occur.
To keep control of data quality, we use approaches to analyze and adapt the data so that inconsistencies can be taken into account for further actions or, if possible, the data is automatically repaired.
The Semantic Web provides SHACL, the shapes constraint language, for graph data validation to detect inconsistencies regarding defined constraints and report them for further processing.
In this talk, we present an approach for validation and processing of inconsistencies to improve the quality for large knowledge graphs. We show a prototypical system which implements this in a high performance setup based on the RDFox triple store, where we combine SHACL validation with Datalog rules to demonstrate inconsistency management for practical use cases.