PoolParty Team asked Les a couple of questions about thesaurus management, linked data and the semantic web:
1. What is the purpose of your thesaurus project in general?
ESA manages a number of vocabularies used in Australian curriculum organization and discovery. The largest vocabulary project is the Schools Online Thesaurus (ScOT), a subject thesaurus that covers the breadth of topics learned in Australian schools. ScOT is central to resource discovery via enhancing search and supporting browse navigation in education portals. ESA also compiles the Australian Curriculum Framework on behalf of the Australian Curriculum, Assessment and Reporting Authority (ACARA), which is used to tag resources within a broad curriculum organization. These and other vocabularies are used to create metadata and provide advanced search filtering. The vocabularies support the structure of the Australian Curriculum and links from curriculum to learning resources.
All ESA vocabularies can be accessed at: http://vocabulary.curriculum.edu.au.
2. Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?
A thesaurus approach was chosen rather than a subject headings approach because we assumed (and continue to assume) that post-coordinate indexing will drive vocabulary-assisted discovery. Our main thesaurus (ScOT) is post-coordinate in structure and contains high levels of granularity. Tagging learning resources with thesaurus terms future-proofs their discoverability; if the curriculum changes the investment in tagging resources is protected.
Synonym and homonym control is essential for any domain but especially in education, where the domain is broad in scope. Thesaurus conventions for non-preferred terms, parenthetical qualifiers and scope notes are used to disambiguate education concepts derived from across the curriculum. Thesaurus ‘related terms’ allow associations between concepts that are in different taxonomies. This supports cross-curriculum resource discovery and allocation.
In addition to well known thesaurus standard (Z39.19), SKOS and the ASN vocabularies support matching concepts between vocabularies. The ASN standard can be used to link similar curriculum statements used in disparate education jurisdiction – a current requirement in Australia.
3. Which role does SKOS and/or Linked Data play in order to achieve your goals?
ScOT concepts are now published as URIs. This approach solves the problem of different ScOT versions in disparate systems. URIs enable validation and encoding concepts with the most current properties (especially labels and relationships), for systems that harvest, share and enable discovery of education resources.
4. What are the most important values you generate for your stakeholders? What kind of applications can be built or have been built on top of your thesauri?
ScOT is used as:
- controlled vocabulary for metadata creation
- search enhancement/ browse structure for discovery systems
The best implementation in a discovery system is Scootle (http://scootle.edu.au)
- Browse from the top ten terms on the homepage
- Search uses non-preferred terms to switch to preferred term search
- Search expanded using narrower terms
- A-Z browse based on ScOT
- User tags suggestions based on ScOT
Other implementations include a project that indexes curriculum statements with vocabulary terms. The Achievement Standards Network (ASN) provides a model for profiling curriculum statements and linking those statements to education resources using various rdf vocabularies. By profiling curriculum statements to learning resources, more precise matching is achieved. This work supports the Australian Curriculum Connect project (pilot).
5. What are the most important arguments to use Semantic Web standards and linked data, especially in education?
Version control has always been an issue for ScOT. Stakeholders update ScOT data in their systems at different times, resulting in disparity between concept properties (labels and relationships). When URIs are stored, rdf fragments can be validated even if the whole vocabulary has not been updated in a system.
The Australian education sector is characterized by many disparate systems in different education jurisdictions. Semantic web technologies are one solution to linking education data in Australia.
6. Why did you choose PoolParty to manage your thesauri?
We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as a our new thesaurus management tool.
We started migrating our vocabulary data to SKOS RDF in 2008. Using last-generation thesaurus management tools, we had to write transformation scripts to produce rdf. In that time, significant vocabulary agencies have made their vocabularies available for use under RDF schemas—particularly SKOS. For example, the entire Library of Congress Subject Headings (LCSH) is now available as SKOS RDF (http://id.loc.gov/authorities/ ), and OCLC is doing the same with Dewey Classifications (http://dewey.info ). This makes vocabularies accessible to the Semantic Web; it also lets vocabularies and vocabulary terms be treated as online resources themselves, though the unique identifiers RDF presupposes for them. This allows more granular and flexible management of vocabularies, and more straightforward association with metadata at the level of terms—satisfying several of the expectations already stated. Indeed, LCSH already includes relation links within the SKOS data to similar concepts in different vocabularies. We were impressed with the way that PoolParty lends itself to simple linking to other vocabulary services, including DBPedia.
7. What are your future plans and next steps? How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work? (Do you plan to publish ScOT on the LOD cloud? Under which licenses?)
The ESA vocabulary roadmap includes:
- Refine and link our vocabularies with other regional vocabularies
- Support interoperation between education jurisdictions by linking curriculum framework vocabularies
- Support ACARA as it releases Australia’s (very first national) curriculum framework
- Engage the Australian National Data Service (ANDS), which is a significant program that advances the data cloud in our region.
- Link to global vocabularies – significant global vocabularies include LCSH and DBPedia.
- Move from mono-lingual to multi-lingual thesaurus – especially ScOT which is used in regional systems with some non-English speaking users;
Our vocabularies are currently for non-commercial use and we don’t anticipate any change to the license at this stage. The ScOT license requires attribution, permits derivatives that must be shared, and is for non-commercial use.