Dominic Oldman: “SKOS is the obvious choice for representing our thesauri in semantic form”

Dominic OldmanDominic Oldman is Deputy Head of the Information Systems department at the British Museum. He is Principal Investigator of the ResearchSpace project, a project funded by the Andrew W. Mellon Foundation aiming to develop a semantic research environment for the culture and heritage sector.

PoolParty Team had the chance to talk with Dominic about the importance of semantic technologies and thesauri (SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ...) in the cultural heritage sector and the plans of the British Museum to integrate these technologies into their information systems.

What is the purpose of your thesaurus project?

The British Museum already uses thesauri as part of its collection record system. They include:

  • Object type (e.g. pin, cup)
  • Material (e.g. paper, stone)
  • Technique of manufacture (e.g. carved, incised)
  • Material Culture/Period (e.g. 13th dynasty, Late Minoan)
  • Ware (specialised thesaurus for pottery, e.g. Black Glaze Ware, Samian)
  • School (used for artworks, e.g. Italian, Aesthetic Movement)
  • Escapement type (specialist thesaurus for clocks and watches)
  • Subject (e.g. animal, acupuncture)
  • Ethnic Name (e.g. Aztec, Yoruba)
  • Place (with modern and archaic types)

These examples are typical for the cultural and heritage sector but many organisations build their own vocabularies. This means that different terms can be used to describe the same type of object. The British Museum leads a cross organisational project, ResearchSpace (www.researchspace.org), which aims to harmonise cultural data supplied by different organisation using the semantic Resource Description FrameworkThe Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a ... (RDFThe Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a ...) standardInternational standards are standards developed by international standards organizations. International standards are available for consideration and use, worldwide. The most prominent organisation is the International Organization for Standardization. International standards may be used either .... The project will use a high level ontology, an ontology is a formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. It is used to reason about the entities within that domain, and may be used to describe the domain. In theory, an ontology is a "formal, explicit ..., the Conceptual Reference Model (CIDOC-CRM), to apply a framework for all the imported data, but it also requires that terminology is harmonised. This means mapping links between different thesauri terms supplied by the users of ResearchSpace.

Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

Thesauri allow museums to quickly locate and use the correct and precise terms for object records. The number of terms held within different thesauri means it would be difficult for staff documenting the Museum’s collection to efficiently and accurately locate the correct terms otherwise. The thesauri are used both to control data entry, and to allow narrower-term searching of our data (so for example, we can do searches such as ‘find all vessels’ without the word ‘vessel’ having to be present in the object descriptions). We can also retrieve correctly using synonym or near-synonym search terms. ResearchSpace will import collection records which are supported by controlled thesauri terms , so although thesauri management is not a key objective of the project, linking between different terms within the thesauri is.

Which role does SKOS and/or Linked DataLinked Data is a sub-topic of the Semantic Web. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web. play in order to achieve your goals?

SKOS is the only well-established semantic standard for thesauri, and it is the obvious choice for representing our thesauri in semantic form. The use of the RDF schema to store data means that data can be easily linked. This principle can be applied to the controlled terms that have been embedded into the different datasets. Mappings between different terms in different thesauri can be used to enhance the connections between data supplied by different organisations. If a search understands that different terminology means the same, or is similar, then this improves the relationships that can be established and allows scholars to find interesting and new pathways, or stories, through the data. We are also interested in extending our collection vocabularies to other internal information systems so that connections can be made between collection data, and say, the events data that we publish on our web site. Establishing a consistent vocabularyA glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms. Traditionally, a glossary appears at the end of a book and includes terms within that book which are either newly introduced, ... for internal systems should improve the integration that can be achieved when publishing to the web and therefore can improve our service to the visitor.

What are the most important values you generate for your stakeholders?

  1. Eventually, as we and other museums and galleries expose our data in semantic form, to allow structured searching across multiple heritage-related data repositories.
  2. To provide enhanced data exploration and visualisation facilities for the curatorial community, by linking with other semantic data repositories, such as GeoNamesGeoNames is a geographical data base available and accessible through various Web services, under a Creative Commons attribution license..

What are the most important arguments to use Semantic Web standardsThe Semantic Web Stack, also known as Semantic Web Cake or Semantic Web Layer Cake, illustrates the architecture of the Semantic Web. and Linked Data, especially in the cultural heritage domain?

The British Museum is a museum of the world and collaborates with people and organisations to enhance our understanding of history and culture through objects. Bringing together data from different organisations can be expensive and take a long time. Even when projects deliver they can have limited scope if the data standards used are not accessible to all. The Semantic WebSemantic Web is a group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information on the World Wide Web. The term was coined by World Wide Web Consortium (W3C) director Tim Berners-Lee. According to the original vision, the availability of ... provides a framework for making data more accessible and easier to harmonise. It has the potential to unlock information that would be difficult to uncover using traditional data technologies. It allows more people and organisations across the world to put the data to more uses that the Museum could do alone.

What kind of applications can be built or have been built on top of your thesauri?

The ResearchSpace project aims to build terminology mapping tools that allow researchers to build mapping profiles to support searching for their particular research projects. These profiles will support a ResearchSpace semantic searchSemantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Author Seth Grimes lists "11 approaches that join ... tool. However, these tools would be independent of PoolParty.

Why did you choose PoolParty to manage your thesauri?

PoolParty is not currently used as the primary way to manage our thesauri but is part of the research and development being undertaken as the Museum moves towards semantic data. It is currently used for examining and experimenting with our thesauri and investigating ways of utilising semantic technology further.

How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work?

This is still in planning.

Do you plan to publish your thesauri or parts of it on the LOD cloudLinked Data is a sub-topic of the Semantic Web. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web.? Under which licenses?

Yes, we hope to publish semantically. The licence has yet to be agreed.

What are your future plans and next steps?

Hopefully, the British Museum will publish its collection records and thesauri as linked data. The ResearchSpace project will be developed over the next year into a working prototype with a view to then providing a full production system for the research community.Dominic OldmanDominic Oldman is Deputy Head of the Information Systems department at the British Museum. He is Principle Investigator of the ResearchSpace project, a project funded by the Andrew W. Mellon Foundation aiming to develop a semantic research environment for the culture and heritage sector.

PoolParty Team had the chance to talk with Dominic about the importance of semantic technologies and thesauri (SKOS) in the cultural heritage sector and the plans of the British Museum to integrate these technologies into their information systems.

What is the purpose of your thesaurus project?

The British Museum already uses thesauri as part of its collection record system. They include:

  • Object type (e.g. pin, cup)
  • Material (e.g. paper, stone)
  • Technique of manufacture (e.g. carved, incised)
  • Material Culture/Period (e.g. 13th dynasty, Late Minoan)
  • Ware (specialised thesaurus for pottery, e.g. Black Glaze Ware, Samian)
  • School (used for artworks, e.g. Italian, Aesthetic Movement)
  • Escapement type (specialist thesaurus for clocks and watches)
  • Subject (e.g. animal, acupuncture)
  • Ethnic Name (e.g. Aztec, Yoruba)
  • Place (with modern and archaic types)

These examples are typical for the cultural and heritage sector but many organisations build their own vocabularies. This means that different terms can be used to describe the same type of object. The British Museum leads a cross organisational project, ResearchSpace (www.researchspace.org), which aims to harmonise cultural data supplied by different organisation using the semantic Resource Description Framework (RDF) standard. The project will use a high level ontology, the Conceptual Reference Model (CIDOC-CRM), to apply a framework for all the imported data, but it also requires that terminology is harmonised. This means mapping links between different thesauri terms supplied by the users of ResearchSpace.

Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

Thesauri allow museums to quickly locate and use the correct and precise terms for object records. The number of terms held within different thesauri means it would be difficult for staff documenting the Museum’s collection to efficiently and accurately locate the correct terms otherwise. The thesauri are used both to control data entry, and to allow narrower-term searching of our data (so for example, we can do searches such as ‘find all vessels’ without the word ‘vessel’ having to be present in the object descriptions). We can also retrieve correctly using synonym or near-synonym search terms. ResearchSpace will import collection records which are supported by controlled thesauri terms , so although thesauri management is not a key objective of the project, linking between different terms within the thesauri is.

Which role does SKOS and/or Linked Data play in order to achieve your goals?

SKOS is the only well-established semantic standard for thesauri, and it is the obvious choice for representing our thesauri in semantic form. The use of the RDF schema to store data means that data can be easily linked. This principle can be applied to the controlled terms that have been embedded into the different datasets. Mappings between different terms in different thesauri can be used to enhance the connections between data supplied by different organisations. If a search understands that different terminology means the same, or is similar, then this improves the relationships that can be established and allows scholars to find interesting and new pathways, or stories, through the data. We are also interested in extending our collection vocabularies to other internal information systems so that connections can be made between collection data, and say, the events data that we publish on our web site. Establishing a consistent vocabulary for internal systems should improve the integration that can be achieved when publishing to the web and therefore can improve our service to the visitor.

What are the most important values you generate for your stakeholders?

  1. Eventually, as we and other museums and galleries expose our data in semantic form, to allow structured searching across multiple heritage-related data repositories.
  2. To provide enhanced data exploration and visualisation facilities for the curatorial community, by linking with other semantic data repositories, such as GeoNames.

What are the most important arguments to use Semantic Web standards and Linked Data, especially in the cultural heritage domain?

The British Museum is a museum of the world and collaborates with people and organisations to enhance our understanding of history and culture through objects. Bringing together data from different organisations can be expensive and take a long time. Even when projects deliver they can have limited scope if the data standards used are not accessible to all. The Semantic Web provides a framework for making data more accessible and easier to harmonise. It has the potential to unlock information that would be difficult to uncover using traditional data technologies. It allows more people and organisations across the world to put the data to more uses that the Museum could do alone.

What kind of applications can be built or have been built on top of your thesauri?

The ResearchSpace project aims to build terminology mapping tools that allow researchers to build mapping profiles to support searching for their particular research projects. These profiles will support a ResearchSpace semantic search tool. However, these tools would be independent of PoolParty.

Why did you choose PoolParty to manage your thesauri?

PoolParty is not currently used as the primary way to manage our thesauri but is part of the research and development being undertaken as the Museum moves towards semantic data. It is currently used for examining and experimenting with our thesauri and investigating ways of utilising semantic technology further.

How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work?

This is still in planning.

Do you plan to publish your thesauri or parts of it on the LOD cloud? Under which licenses?

Yes, we hope to publish semantically. The licence has yet to be agreed.

What are your future plans and next steps?

Hopefully, the British Museum will publish its collection records and thesauri as linked data. The ResearchSpace project will be developed over the next year into a working prototype with a view to then providing a full production system for the research community.Dominic OldmanDominic Oldman is Deputy Head of the Information Systems department at the British Museum. He is Principal Investigator of the ResearchSpace project, a project funded by the Andrew W. Mellon Foundation aiming to develop a semantic research environment for the culture and heritage sector.

PoolParty Team had the chance to talk with Dominic about the importance of semantic technologies and thesauri (SKOS) in the cultural heritage sector and the plans of the British Museum to integrate these technologies into their information systems.

What is the purpose of your thesaurus project?

The British Museum already uses thesauri as part of its collection record system. They include:

  • Object type (e.g. pin, cup)
  • Material (e.g. paper, stone)
  • Technique of manufacture (e.g. carved, incised)
  • Material Culture/Period (e.g. 13th dynasty, Late Minoan)
  • Ware (specialised thesaurus for pottery, e.g. Black Glaze Ware, Samian)
  • School (used for artworks, e.g. Italian, Aesthetic Movement)
  • Escapement type (specialist thesaurus for clocks and watches)
  • Subject (e.g. animal, acupuncture)
  • Ethnic Name (e.g. Aztec, Yoruba)
  • Place (with modern and archaic types)

These examples are typical for the cultural and heritage sector but many organisations build their own vocabularies. This means that different terms can be used to describe the same type of object. The British Museum leads a cross organisational project, ResearchSpace (www.researchspace.org), which aims to harmonise cultural data supplied by different organisation using the semantic Resource Description Framework (RDF) standard. The project will use a high level ontology, the Conceptual Reference Model (CIDOC-CRM), to apply a framework for all the imported data, but it also requires that terminology is harmonised. This means mapping links between different thesauri terms supplied by the users of ResearchSpace.

Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

Thesauri allow museums to quickly locate and use the correct and precise terms for object records. The number of terms held within different thesauri means it would be difficult for staff documenting the Museum’s collection to efficiently and accurately locate the correct terms otherwise. The thesauri are used both to control data entry, and to allow narrower-term searching of our data (so for example, we can do searches such as ‘find all vessels’ without the word ‘vessel’ having to be present in the object descriptions). We can also retrieve correctly using synonym or near-synonym search terms. ResearchSpace will import collection records which are supported by controlled thesauri terms , so although thesauri management is not a key objective of the project, linking between different terms within the thesauri is.

Which role does SKOS and/or Linked Data play in order to achieve your goals?

SKOS is the only well-established semantic standard for thesauri, and it is the obvious choice for representing our thesauri in semantic form. The use of the RDF schema to store data means that data can be easily linked. This principle can be applied to the controlled terms that have been embedded into the different datasets. Mappings between different terms in different thesauri can be used to enhance the connections between data supplied by different organisations. If a search understands that different terminology means the same, or is similar, then this improves the relationships that can be established and allows scholars to find interesting and new pathways, or stories, through the data. We are also interested in extending our collection vocabularies to other internal information systems so that connections can be made between collection data, and say, the events data that we publish on our web site. Establishing a consistent vocabulary for internal systems should improve the integration that can be achieved when publishing to the web and therefore can improve our service to the visitor.

What are the most important values you generate for your stakeholders?

  1. Eventually, as we and other museums and galleries expose our data in semantic form, to allow structured searching across multiple heritage-related data repositories.
  2. To provide enhanced data exploration and visualisation facilities for the curatorial community, by linking with other semantic data repositories, such as GeoNames.

What are the most important arguments to use Semantic Web standards and Linked Data, especially in the cultural heritage domain?

The British Museum is a museum of the world and collaborates with people and organisations to enhance our understanding of history and culture through objects. Bringing together data from different organisations can be expensive and take a long time. Even when projects deliver they can have limited scope if the data standards used are not accessible to all. The Semantic Web provides a framework for making data more accessible and easier to harmonise. It has the potential to unlock information that would be difficult to uncover using traditional data technologies. It allows more people and organisations across the world to put the data to more uses that the Museum could do alone.

What kind of applications can be built or have been built on top of your thesauri?

The ResearchSpace project aims to build terminology mapping tools that allow researchers to build mapping profiles to support searching for their particular research projects. These profiles will support a ResearchSpace semantic search tool. However, these tools would be independent of PoolParty.

Why did you choose PoolParty to manage your thesauri?

PoolParty is not currently used as the primary way to manage our thesauri but is part of the research and development being undertaken as the Museum moves towards semantic data. It is currently used for examining and experimenting with our thesauri and investigating ways of utilising semantic technology further.

How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work?

This is still in planning.

Do you plan to publish your thesauri or parts of it on the LOD cloud? Under which licenses?

Yes, we hope to publish semantically. The licence has yet to be agreed.

What are your future plans and next steps?

Hopefully, the British Museum will publish its collection records and thesauri as linked data. The ResearchSpace project will be developed over the next year into a working prototype with a view to then providing a full production system for the research community.