Norman Gray: “SKOS is just as complicated as we need it to be.”

Norman Gray is researcher with the Astronomy Group of University of Glasgow.

Among other projects he is member in the IVOA (International Virtual Observatory Alliance) which uses SKOSSimple Knowledge Organization System (SKOS) is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to ... and PoolParty for developing and utilizing vocabularies in a collaborative way.

Q: Norman, please tell us about your project and its purpose.

The astronomy and astrophysics communities of the major countries launched in 2000 an international program called the Virtual Observatory to develop an interoperability layer above data centres. The objective is to allow any scientist to get access immediately and in a transparent way to any observation taken by any telescope or satellite. This relies on the definition of standards to describe data. These standards are developed by the IVOA, which is in some ways like a ‘W3C for Astronomy’ .

Among the working groups at the IVOA, the Semantics Working Group aims at developing vocabularies. Now that the first standards for interoperability between data centres have been established, web semantics is going to become one of the major directions to describe data and allow scientists to discover them.

The most immediate project is a collaboration with colleagues at Paris Observatory, to develop a group of three or four related, small thesauri to support retrieval of simulation datasets. A group at the IVOA is developing a protocol which will allow users to search for and retrieve the results of pre-computed large-scale simulations. That means letting users ask questions like “I want to find simlations of object of type X, using simulation techniques Y and Z, with results in the form W”, so one aspect of that is devising or adapting small thesauri from which the terms X, Y, Z and W can be drawn.

The IVOA had settled on SKOS as a way of addressing this problem, a little earlier in the year — therefore it was natural to use SKOS for this purpose. A thesaurus is something that the whole community has to buy in to, so it’s important to find a way in which the development can take place manifestly in the open.

The larger-scale project is nothing less than bringing the Semantic WebSemantic Web is a group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information on the World Wide Web. The term was coined by World Wide Web Consortium (W3C) director Tim Berners-Lee. According to the original vision, the availability of ... to astronomy (or do I mean that the other way round?). The astronomical community has been technically sophisticated for centuries: almost never interested in technology for its own sake, but always willing to take techniques which show themselves to be useful, and push them right to the edge of where they’ll go. Astronomy is also full of both structured data and structured information, so if I can persuade a few more people that the Semantic Web isn’t just a passing fad — isn’t just computing science’s current hobbyhorse — then we should see some very powerful applications.

Q: Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

The important features that SKOS has are these:

  • It’s open: everyone in the community can see how it was built and who built it, and no-one can close up the results afterwards.
  • It’s a standardInternational standards are standards developed by international standards organizations. International standards are available for consideration and use, worldwide. The most prominent organisation is the International Organization for Standardization. International standards may be used either ...: others have bought into this, so we don’t have to provide all of the systems which will keep this technology useful in the coming decades.
  • It’s structured, but flexible: it’s just as complicated as we need it to be, so that we can be heroically restrained, and make a brutally simple thesaurus/vocabularyA glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms. Traditionally, a glossary appears at the end of a book and includes terms within that book which are either newly introduced, ... if that’s what serves a particular purpose; or unleash our inner systematizer, and build a wonderfully baroque set of descriptors, if we’ve a problem which can benefit from that. At that point, there’s an easy transition to the Semantic Web, so we can easily step over that threshold if it becomes useful later.

Q: Which role does SKOS and/or Linked DataLinked Data is a sub-topic of the Semantic Web. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web. play in order to achieve your goals?

The W3C’s standardisation of SKOS was important for us, because it showed that this wasn’t just someone’s hobby, but something that would last. And the Linked Data idea is important, because it takes something that’s sort-of obvious in retrospect — that’s in the air — and pins it down, labels it, and gives it a journal reference for everyone to point to. That’s valuable when you’re trying to sell the idea to the sceptical: it’s massively helpful to be able to say “this is called Linked Data, and this is the DOI you should read”.

Q: What are the most important values you generate for your stakeholders? What kind of applications can be built or have been built on top of your thesauri?

Right now, I don’t really know! Myself, I’m still operating on the principle of ‘build it and they will come’. We’ve got to lay foundations which are pretty boring to most people, in order to let others build flashy structures on top.

Q: What are the most important arguments to use Semantic Web standardsThe Semantic Web Stack, also known as Semantic Web Cake or Semantic Web Layer Cake, illustrates the architecture of the Semantic Web. and Linked Data in scientific communites?

It depends on whom you’re arguing with. With an astronomer, the best argument is a working demo: it doesn’t have to be finished, it doesn’t have to be pretty, it doesn’t have to have a GUI (indeed, in some cases it’s better if it doesn’t), but it does have to do something they couldn’t easily do before; if you can use it from python, that’s a bonus. I think it has to do something they could do before, and wanted to, but which was hard or annoying. With that demo, if you mention Linked Data or the Semantic Web, you lose: no-one’s interested in computing science buzzwords.

With astroinformatics developers, you’ve got a different argument (though the two communities overlap more than I’m perhaps suggesting). Semantic Web technologies are still a hard sell, because so many different things have to be lined up before you can start playing (this is the motivation behind Danny Ayers’ ‘Semantic Web in a Box’ idea and ongoing, and behind my own SKUA project). The Semantic Web is currently a good sell if you’re talking to someone with strong systematize-the-world instincts, but it still looks like a lot of work for too little fun to everyone else. When we can get some basic services up, though, and say “Linked Data? That’ll be this site, then…, with this API…”, it’ll be easier.

Q: Why did you choose PoolParty to manage your thesauri?

I had experimented in 2009 and 2010 with using Semantic MediaWiki to create a wiki-like experience for community curation of vocabularies. It mostly worked, in the sense that we had some pages which described a particular concept, and which could have SKOS RDFThe Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a ... extracted from them. But then we took a second look at what we’d done, and realised that it had been fiddly work to get something which looked only reasonable, which we knew was rather fragile, and which we knew would be a pain to extend. That meant that when I heard about PoolParty, I saw the point pretty quickly.

After starting to experiment with PoolParty over the summer, we’ve had good experiences. We’ve fed some comments back to punkt. netServices — including some very firm requests! — but nothing’s gone wrong, and we haven’t hit the boundaries yet.

Q: What are your future plans and next steps? How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work?

Our plans are still in the “see what happens” stage. The thesauri we’re building in this first stage are pretty small ones, so the goal is really just to get a few more people used to seeing “skos:Concept” in their editor. At some point in 2011 we’ll start to encourage members of the wider astronomical community to criticise and add to the thesauri, mediated by PoolParty. We don’t have a hard deadline for this project, so we’re currently willing to go with the flow and see where we end up.

Q: As your thesauri will be publically available which licenses do you plan to use?

It’ll definitely be some sort of open licence — I’d guess one of the Creative Commons ones — but we haven’t had to worry about it yet.

Thank you for this interview and your insights, Norman.