Describe data by scripts for future reuse
Nearly 120 IT professionals, scientists and managers from the Photon and Neutron (PaN) community attended the 2nd European PaN EOSC Symposium organised jointly by PaNOSC and ExPaNDS on 26th October 2021. The second part of the first session focused on a selection of use cases relating to some of the tools and services developed in the EOSC projects, for FAIR data catalogues, data analysis and simulation.
Here is the first use case. The presentation starts 31m:53s in.
There is a long-lasting discussion in the PaN community about how to properly describe the data and which metadata are useful. To fulfil the last letter in FAIR, data needs to be reusable, which is often the most difficult task for large research infrastructures users.
Petr Čermák presented an easy and convenient way of describing the data by user scripts, using publicly available data at PaNOSC ILL, treating them with open-source software and publishing the scripts on a GitHub repository. The repository at Figshare was mirrored to get a citable entity and show how to use Binder to re-evaluate the data from any computer in the world “even after 100 years”.
This approach describes how processed data is obtained, through a transparent evaluation. Referees of the upcoming publication can easily verify the data treatment process; other scientists can easily learn how data can be treated and – most importantly – that the data treatment process will work forever.
Keywords: metadata, open data, FAIR, figshare, binder, data processing, wp5-ExPaNDS
Resource type: video, slides
Licence: Creative Commons Attribution 4.0