Les Kneebone: “Semantic web technologies are one solution to linking education data in Australia”

Les Kneebone is Project Manager at Education Services Australia Ltd.
Among other projects he is responsible for Schools Online Thesaurus (ScOT).

PoolParty Team asked Les a couple of questions about thesaurus management, linked dataLinked Data is a sub-topic of the Semantic Web. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web. and the semantic webSemantic Web is a group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information on the World Wide Web. The term was coined by World Wide Web Consortium (W3C) director Tim Berners-Lee. According to the original vision, the availability of ...:

1.    What is the purpose of your thesaurus project in general?

ESA manages a number of vocabularies used in Australian curriculum organization and discovery. The largest vocabularyA glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms. Traditionally, a glossary appears at the end of a book and includes terms within that book which are either newly introduced, ... project is the Schools Online Thesaurus (ScOT), a subject thesaurus that covers the breadth of topics learned in Australian schools. ScOT is central to resource discovery via enhancing search and supporting browse navigation in education portals. ESA also compiles the Australian Curriculum Framework on behalf of the Australian Curriculum, Assessment and Reporting Authority (ACARA), which is used to tag resources within a broad curriculum organization. These and other vocabularies are used to create metadataMetadata is loosely defined as data about data. Metadata is a concept that applies mainly to electronically archived or presented data and is used to describe the a) definition, b) structure and c) administration of data files with all contents in context to ease the use of the captured and ... and provide advanced search filtering. The vocabularies support the structure of the Australian Curriculum and links from curriculum to learning resources.

All ESA vocabularies can be accessed at: http://vocabulary.curriculum.edu.au.

2.    Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

A thesaurus approach was chosen rather than a subject headings approach because we assumed (and continue to assume) that post-coordinate indexing will drive vocabulary-assisted discovery. Our main thesaurus (ScOT) is post-coordinate in structure and contains high levels of granularity. TaggingAn annotation is notes that you make to yourself while you are reading information in a book, document, online record, video, software code or other information, "in the margin", or perhaps just underlined or highlighted passages. Annotated bibliographies, give descriptions about how each source ... learning resources with thesaurus terms future-proofs their discoverability; if the curriculum changes the investment in tagging resources is protected.

Synonym and homonym control is essential for any domain but especially in education, where the domain is broad in scope. Thesaurus conventions for non-preferred terms, parenthetical qualifiers and scope notes are used to disambiguate education concepts derived from across the curriculum. Thesaurus ‘related terms’ allow associations between concepts that are in different taxonomiesTaxonomy is the practice and science of classification. The word finds its roots in the Greek τάξις, taxis (meaning 'order' or 'arrangement') and νόμος, nomos (meaning 'law' or 'science'). Taxonomy uses taxonomic units, known as taxa. In addition, the word is also used as a count noun: .... This supports cross-curriculum resource discovery and allocation.

In addition to well known thesaurus standard (Z39.19), SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ... and the ASN vocabularies support matching concepts between vocabularies. The ASN standardInternational standards are standards developed by international standards organizations. International standards are available for consideration and use, worldwide. The most prominent organisation is the International Organization for Standardization. International standards may be used either ... can be used to link similar curriculum statements used in disparate education jurisdiction – a current requirement in Australia.

3.    Which role does SKOS and/or Linked Data play in order to achieve your goals?

ScOT concepts are now published as URIs. This approach solves the problem of different ScOT versions in disparate systems. URIs enable validation and encoding concepts with the most current properties (especially labels and relationships), for systems that harvest, share and enable discovery of education resources.

4.    What are the most important values you generate for your stakeholders? What kind of applications can be built or have been built on top of your thesauri?

ScOT is used as:

  • controlled vocabularyControlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri and taxonomies. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designer of ... for metadata creation
  • search enhancement/ browse structure for discovery systems

The best implementation in a discovery system is Scootle (http://scootle.edu.au)

  • Browse from the top ten terms on the homepage
  • Search uses non-preferred terms to switch to preferred term search
  • Search expanded using narrower terms
  • A-Z browse based on ScOT
  • User tags suggestions based on ScOT

Other implementations include a project that indexes curriculum statements with vocabulary terms. The Achievement Standards Network (ASN) provides a model for profiling curriculum statements and linking those statements to education resources using various rdf vocabularies. By profiling curriculum statements to learning resources, more precise matching is achieved. This work supports the Australian Curriculum Connect project (pilot).

5.    What are the most important arguments to use Semantic Web standardsThe Semantic Web Stack, also known as Semantic Web Cake or Semantic Web Layer Cake, illustrates the architecture of the Semantic Web. and linked data, especially in education?

Version control has always been an issue for ScOT. Stakeholders update ScOT data in their systems at different times, resulting in disparity between concept properties (labels and relationships). When URIs are stored, rdf fragments can be validated even if the whole vocabulary has not been updated in a system.

The Australian education sector is characterized by many disparate systems in different education jurisdictions. Semantic web technologies are one solution to linking education data in Australia.

6.    Why did you choose PoolParty to manage your thesauri?

We had already identified SKOS as an important standard for ScOT so it was natural to select PoolParty as a our new thesaurus management tool.

We started migrating our vocabulary data to SKOS RDFThe Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a ... in 2008. Using last-generation thesaurus management tools, we had to write transformation scripts to produce rdf. In that time, significant vocabulary agencies have made their vocabularies available for use under RDF schemas—particularly SKOS. For example, the entire Library of Congress Subject Headings (LCSH) is now available as SKOS RDF (http://id.loc.gov/authorities/ ), and OCLC is doing the same with Dewey Classifications (http://dewey.info ). This makes vocabularies accessible to the Semantic Web; it also lets vocabularies and vocabulary terms be treated as online resources themselves, though the unique identifiers RDF presupposes for them. This allows more granular and flexible management of vocabularies, and more straightforward association with metadata at the level of terms—satisfying several of the expectations already stated. Indeed, LCSH already includes relation links within the SKOS data to similar concepts in different vocabularies. We were impressed with the way that PoolParty lends itself to simple linking to other vocabulary services, including DBPedia.

7. What are your future plans and next steps? How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work? (Do you plan to publish ScOT on the LOD cloudLinked Data is a sub-topic of the Semantic Web. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web.? Under which licenses?)

The ESA vocabulary roadmap includes:

  • Refine and link our vocabularies with other regional vocabularies
  • Support interoperation between education jurisdictions by linking curriculum framework vocabularies
  • Support ACARA as it releases Australia’s (very first national) curriculum framework
  • Engage the Australian National Data Service (ANDS), which is a significant program that advances the data cloud in our region.
  • Link to global vocabularies – significant global vocabularies include LCSH and DBPediaDBpedia is a project aiming to extract structured information from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web. DBpedia allows users to query relationships and properties associated with Wikipedia resources, ....
  • Move from mono-lingual to multi-lingual thesaurus – especially ScOT which is used in regional systems with some non-English speaking users;

Our vocabularies are currently for non-commercial use and we don’t anticipate any change to the license at this stage. The ScOT license requires attribution, permits derivatives that must be shared, and is for non-commercial use.