How does PoolParty Semantic Suite integrate with Graph Databases? – Part 2
This is a continuation from the first part of this series of blog posts about 'How does PoolParty Semantic Suite integrate with Graph Databases?'. Part one gives an overview of the integration architecture, followed by a description of components and their requirements regarding data storage. Following this, we will now elaborate on other triple store solutions that are not integrated into PoolParty directly. This blog will focus also on how PoolParty Semantic Suite is used as a Middleware platform and the functional requirements to successfully integrate a graph database with PoolParty.
PoolParty Semantic Suite as a Middleware
PoolParty Semantic Suite is a middleware platform that provides Semantic AI solutions as a service to build and manage Knowledge Graphs. In most cases, users develop RDF-based applications that run as layers on PoolParty APIs. You decouple the application from PoolParty by virtualizing application data in a separate graph database (or multiple stores at the same time) and connecting the application to it. Such an architecture separates all tasks cleanly and frees PoolParty from having to act like a database. The application scales better without affecting the other core features of PoolParty. This also creates a staging environment in which PoolParty's knowledge models can be continuously evolved, while a stable release version in the graph database will serve the application in production. Additionally, PoolParty provides APIs that export the project data to external graph databases and keep them in sync continuously, or on demand when necessary.
For components of PoolParty Semantic Suite, we can identify these functional requirements:
- RDF4J implementation
- Transactional storage of large data sets (ACID compliance)
- SPARQL 1.1 query and update engine
For enterprise scenarios we support:
- Fault tolerance
- High throughput
While most RDF graph databases provide adequate performance for querying and data analysis, fast storage operations are not the main focus. Use cases are often based on data publishing, like in a data warehouse context, where fast querying is an issue, while data modifications do not occur after an initial bulk upload. In contrast, PoolParty components work as applications on top of the graph databases and therefore have different requirements for regular data modification.
Further Requirements for Integration Scenarios
These requirements are not directly covered by PoolParty features but are provided by supported graph databases for integration scenarios. Although they are not required for generic features, they are nonetheless important for some use cases and situations.
- Built-in reasoning support
- Machine Learning algorithms
- Security on graph level or triple-level
- GeoSPARQL support
- Non-native graph databases for providing a relational view
- Internal textual index for fast literal queries
- Access to heterogeneous data sources (RDB, XML, JSON)
- Easy deployment based on standard technologies (e.g., Java)
- Visualization tools for RDF
- Usability of store management tools
Graph Database Integration with PoolParty Semantic Suite
RDF4J provides its own graph database for data storage. PoolParty supports both local stores and remote stores. Local stores are mounted from a local drive directly into the web application and therefore provide fast operations on the server without the need of network transmissions. Of course, having the data only locally accessible has some drawbacks. It is not provided by a service and therefore can only be accessed by using the local PoolParty instance. As an alternative, PoolParty supports RDF4J server also as a remote store, which runs as a network service. All PoolParty components are supported to run on this store. However, it is not recommended for very large data sets or high-performance scenarios and does not provide high availability or high throughput architecture.
Stardog is the second remote store that currently supports all PoolParty components. In contrast to RDF4J, Stardog is recommended for high-performance scenarios and also supports a cluster architecture for high availability. Stardog provides a diverse set of features like virtual graphs for data integration and machine learning on graph data, which can be used in combination with PoolParty components in various use-case scenarios.
MarkLogic, AllegroGraph and GraphDB
MarkLogic, AllegroGraph and GraphDB, as well as RDF4J and Stardog, are supported for export of thesaurus and ontology data. PoolParty components directly transfer the RDF to these remote stores. GraphDB (as well as the external RDF4J) can also be used to work as an external working store for PoolParty UnifiedViews. GraphDB, AllegroGraph and MarkLogic are supported for the PoolParty GraphSearch component.
Virtuoso is a high-performance store, providing fast query evaluation and data storage. PoolParty integrates Virtuoso for fast storage of PoolParty’s Corpus Analysis NLP results.
Although Neo4j is not based on W3C standards and does not support RDF data, PoolParty provides an export that transforms from RDF to the Neo4j data model so users can take advantage of the Neo4j server features like visualization.
Oracle Database supports Semantic Web standards as an RDF triple store via the Spatial and Graph feature. PoolParty is going towards an integration for all components for scalable and failsafe high-performance scenarios. Oracle Database can be used in the cloud or on-premise and we also want to support the PoolParty Suite in Oracle’s cloud infrastructure OCI for a complete enterprise-ready setup. In the future, PoolParty will benefit from Oracle database’s analytics, recommendation and machine learning features for various use-case scenarios.
In this series of blog posts (in part 1 we introduced an overview of PoolParty Components and their Graph Database Requirements) we have described how the components of PoolParty Semantic Suite integrate with graph databases and which requirements they have in order to provide a high level of performance and user experience to customers. As the leading semantic middleware platform, PoolParty integrates on a generic level, supporting a broad range of graph databases that provide flexibility to customers. In the future, this will be expanded to provide even more store options to support different scenarios as well as enterprise-ready architectures as the most complete semantic middleware.
DATA KNOWLEDGE ENGINEER
CHIEF TECHNOLOGY OFFICER