Use Cases for Data Integration across the Linked Data Lifecycle
Our team of Data & Knowledge Engineers was invited to give a talk at the Universitat Politècnica de Catalunya to delight Semantic Web students of the Database Technologies and Information Management Group (DTIM research group), led by Alberto Abello. Alberto is a renowned lecturer in the areas of advanced data management, big data management, business intelligence, metadata management, and ontologies.
On 2nd and 3rd of May, our ontology and computational intelligence specialist Albin Ahmeti gave a lecture to about 50 students of a Master’s Degree as part of a Semantic Web course taught by Oscar Romero and Besim Bilalli.
During the lecture, students had an overview of PoolParty Semantic Suite as a preferred tool for “Data Integration across the Linked Data Lifecycle.” The lecture explains various use cases and demos and how they are built across all the linked data lifecycle phases: data ingestion, cleaning, authoring, linking, enrichment, provisioning and analysis.
We presented the following use cases:
- “HR demo” - a Semantic Search on top of structured, unstructured data and other barometric data;
- “Increasing the traffic of the website using semantics - Cross-component data wrangling” where first a website is crawled using Corpora search and then the data is wrangled to present it beautifully in GraphSearch;
- “SHACL-based data lifecycle management - contract validation” where a semantic application is showcased that uses SPARQL/SHACL to check if different rules are satisfied in a given contract; and
- “Video games” - leveraging Linked Data Harvesting in PoolParty to build a taxonomy on Video games, and then build a search by using different structured and unstructured data.
On top of this, Albin Ahmeti gave a talk about “The Interplay of SPARQL/Updates and Entailment Regimes” derived from his Ph.D. thesis “Updates in the context of Ontology-based data access.” During the talk, Ph.D. students & other members of DTIM group (about 10 people) were able to learn about how the interplay of updates and reasoning is treated in the paradigm of Ontology-based data access (OBDA).
Feel free to read the abstract of these two topics and have a look at the slide deck.
Lecture Title: “Data Integration across the Linked Data Lifecycle.”
Abstract: When integrating data into the wild, we witness a different kind of data, namely unstructured such as text, semi-structured such as XML, JSON, and structured data residing in the legacy relational databases. In the context of Big Data, it is always challenging to make sense of such a volume of heterogeneous data. Semantic Web is the approach to tackle the problem of data integration by using the RDF layer on top, by either re-converting, i.e., re-materializing in a triple store, or by merely mapping them. RDF data is represented as a graph, which is closer to how human thinks and more agile in the perspective of data changes compared to other data models. Taxonomies and ontologies as two pillars play a central role in this process of data integration in RDF. We explain the iterative model of data integration across Linked Data Lifecycle, where we dissect different phases such as: data ingestion, cleaning, authoring, linking, enrichment, provisioning and analysis. We motivate each of the phases by showing real examples, demos or use cases, leveraging components of PoolParty Semantic Suite such as Taxonomy, Ontology management; PoolParty Extractor; PoolParty GraphEditor; PoolParty GraphSearch; PoolParty UnifiedViews.
Talk Title: “The Interplay of SPARQL/Updates and Entailment Regimes”
Abstract: Ontology-based data access (OBDA) is a data integration framework where data is accessed using queries on the ontology layer, and in turn, a set of mappings guide the data access on the underlying relational layer. This process is usually done via query rewriting wrt. ontology and query unfolding wrt. mappings respectively. In this way, the data need not to be re-materialized in a triple store, but rather---in an orthogonal way---data is exposed as a view via the defined ontology concepts and properties. Nevertheless, in the context of ontology-based data management, i.e., updates in OBDA are challenging as the problem of view update is inherently not solved in the field of database theory. We discuss the problem and possible solutions in various settings, as well as the relation to triple stores. In the end, we show two components of PoolParty Semantic Suite, namely GraphSearch and GraphEditor as candidates for ontology-based data management.
DATA KNOWLEDGE ENGINEER