Select Page

Norman Gray: “SKOS is just as complicated as we need it to be.”

January 11, 2011

8

All Posts

Norman Gray is researcher with the Astronomy Group of University of Glasgow.

Among other projects he is member in the IVOA (International Virtual Observatory Alliance) which uses SKOS and PoolParty for developing and utilizing vocabularies in a collaborative way.

Q: Norman, please tell us about your project and its purpose.

The astronomy and astrophysics communities of the major countries launched in 2000 an international program called the Virtual Observatory to develop an interoperability layer above data centres. The objective is to allow any scientist to get access immediately and in a transparent way to any observation taken by any telescope or satellite. This relies on the definition of standards to describe data. These standards are developed by the IVOA, which is in some ways like a ‘W3C for Astronomy’ .

Among the working groups at the IVOA, the Semantics Working Group aims at developing vocabularies. Now that the first standards for interoperability between data centres have been established, web semantics is going to become one of the major directions to describe data and allow scientists to discover them.

The most immediate project is a collaboration with colleagues at Paris Observatory, to develop a group of three or four related, small thesauri to support retrieval of simulation datasets. A group at the IVOA is developing a protocol which will allow users to search for and retrieve the results of pre-computed large-scale simulations. That means letting users ask questions like “I want to find simlations of object of type X, using simulation techniques Y and Z, with results in the form W”, so one aspect of that is devising or adapting small thesauri from which the terms X, Y, Z and W can be drawn.

The IVOA had settled on SKOS as a way of addressing this problem, a little earlier in the year — therefore it was natural to use SKOS for this purpose. A thesaurus is something that the whole community has to buy in to, so it’s important to find a way in which the development can take place manifestly in the open.

The larger-scale project is nothing less than bringing the Semantic Web to astronomy (or do I mean that the other way round?). The astronomical community has been technically sophisticated for centuries: almost never interested in technology for its own sake, but always willing to take techniques which show themselves to be useful, and push them right to the edge of where they’ll go. Astronomy is also full of both structured data and structured information, so if I can persuade a few more people that the Semantic Web isn’t just a passing fad — isn’t just computing science’s current hobbyhorse — then we should see some very powerful applications.

Q: Why did you choose thesauri to organize your information? What kind of problems are you able to solve with this approach?

The important features that SKOS has are these:

  • It’s open: everyone in the community can see how it was built and who built it, and no-one can close up the results afterwards.
  • It’s a standard: others have bought into this, so we don’t have to provide all of the systems which will keep this technology useful in the coming decades.
  • It’s structured, but flexible: it’s just as complicated as we need it to be, so that we can be heroically restrained, and make a brutally simple thesaurus/vocabulary if that’s what serves a particular purpose; or unleash our inner systematizer, and build a wonderfully baroque set of descriptors, if we’ve a problem which can benefit from that. At that point, there’s an easy transition to the Semantic Web, so we can easily step over that threshold if it becomes useful later.

Q: Which role does SKOS and/or Linked Data play in order to achieve your goals?

The W3C’s standardisation of SKOS was important for us, because it showed that this wasn’t just someone’s hobby, but something that would last. And the Linked Data idea is important, because it takes something that’s sort-of obvious in retrospect — that’s in the air — and pins it down, labels it, and gives it a journal reference for everyone to point to. That’s valuable when you’re trying to sell the idea to the sceptical: it’s massively helpful to be able to say “this is called Linked Data, and this is the DOI you should read”.

Q: What are the most important values you generate for your stakeholders? What kind of applications can be built or have been built on top of your thesauri?

Right now, I don’t really know! Myself, I’m still operating on the principle of ‘build it and they will come’. We’ve got to lay foundations which are pretty boring to most people, in order to let others build flashy structures on top.

Q: What are the most important arguments to use Semantic Web standards and Linked Data in scientific communites?

It depends on whom you’re arguing with. With an astronomer, the best argument is a working demo: it doesn’t have to be finished, it doesn’t have to be pretty, it doesn’t have to have a GUI (indeed, in some cases it’s better if it doesn’t), but it does have to do something they couldn’t easily do before; if you can use it from python, that’s a bonus. I think it has to do something they could do before, and wanted to, but which was hard or annoying. With that demo, if you mention Linked Data or the Semantic Web, you lose: no-one’s interested in computing science buzzwords.

With astroinformatics developers, you’ve got a different argument (though the two communities overlap more than I’m perhaps suggesting). Semantic Web technologies are still a hard sell, because so many different things have to be lined up before you can start playing (this is the motivation behind Danny Ayers’ ‘Semantic Web in a Box’ idea and ongoing, and behind my own SKUA project). The Semantic Web is currently a good sell if you’re talking to someone with strong systematize-the-world instincts, but it still looks like a lot of work for too little fun to everyone else. When we can get some basic services up, though, and say “Linked Data? That’ll be this site, then…, with this API…”, it’ll be easier.

Q: Why did you choose PoolParty to manage your thesauri?

I had experimented in 2009 and 2010 with using Semantic MediaWiki to create a wiki-like experience for community curation of vocabularies. It mostly worked, in the sense that we had some pages which described a particular concept, and which could have SKOS RDF extracted from them. But then we took a second look at what we’d done, and realised that it had been fiddly work to get something which looked only reasonable, which we knew was rather fragile, and which we knew would be a pain to extend. That meant that when I heard about PoolParty, I saw the point pretty quickly.

After starting to experiment with PoolParty over the summer, we’ve had good experiences. We’ve fed some comments back to punkt. netServices — including some very firm requests! — but nothing’s gone wrong, and we haven’t hit the boundaries yet.

Q: What are your future plans and next steps? How do you manage to get your thesauri used, how are you going to build an “eco-system” around your work?

Our plans are still in the “see what happens” stage. The thesauri we’re building in this first stage are pretty small ones, so the goal is really just to get a few more people used to seeing “skos:Concept” in their editor. At some point in 2011 we’ll start to encourage members of the wider astronomical community to criticise and add to the thesauri, mediated by PoolParty. We don’t have a hard deadline for this project, so we’re currently willing to go with the flow and see where we end up.

Q: As your thesauri will be publically available which licenses do you plan to use?

It’ll definitely be some sort of open licence — I’d guess one of the Creative Commons ones — but we haven’t had to worry about it yet.

Thank you for this interview and your insights, Norman.

You may also like these posts …