My data is your data

October 4, 2015

Academics in the UK are just coming to terms with an open access policy for publications (from paid ‘gold’ to free ‘green’ university repositories).

What has received significantly less attention is the new UK research data policy. In my experience, raising this issue in conversation is met with blank expressions… what data policy?

Some key points from https://www.epsrc.ac.uk/about/standards/researchdata/. From 1st May 2015:

Published research papers should include a short statement describing how and on what terms any supporting research data may be accessed.

The metadata must be sufficient to allow others to understand what research data exists, why, when and how it was generated, and how to access it.

My university, like most others, has put together policy and guidance documents but they are quite generic and don't seem to have really filtered down to the researcher level.

In my field of computational materials science, there are now several options:

  • GitHub - my group has been using this a lot for research (DOIs can be generated via the EU-funded project Zenodo; 2GB limit per repository). Instead of building separate repositories for each paper, we have been collecting related information, e.g. Phonons and Crystal Structures.
  • Mendeley Data - a nice clean interface for uploading data and generating DOIs, but I haven’t seen any clear policy for storage limits or guaranteed data lifetimes.
  • Figshare - this repository plays nice with raw datasets and multimedia (e.g. a hybrid perovskite MD video). The serious drawback is a 1GB storage limit (per free account) with a 250 MB file size limit.
  • NoMaD - a new respository to “host, organize and share materials data”. I have great hopes for this one, but at the moment the website is a little jaded, and the interface is light years behind the Materials Project (which serves a different purpose of being a single source database).

Ideally, a standard protocol would be adopted in the community to avoid the individual ‘data dumps’ that university repositories enable in favour of a systematic and searchable community database. Aiida allows one to do this at the research group or collaborator level, but I hope that NoMaD can build a critical mass of researchers (and sustained funding) to make this a reality.