Open Data

OPEN DATA

In recent years, open access not only to publications but also to research data has been increasingly promoted at both the international and national level. We use the term open data to refer to research data that are freely available online and can be further used, modified and shared for any purpose or in new contexts. Storing research data in open access maximizes their use and ensures the quality of research results. Open data can contribute to increased transparency and efficiency of work, where the same research will not be conducted again, even in the case of negative research results.

Open data may include non-textual material such as maps, genomes, chemical compounds, mathematical formulas, medical data, etc. In a broader sense, primary data are collected in the form of facts, observations, images, results of computer programs, records, measurements and experiences, secondary analysis, visualizations, models, analytical tools, collections of objects or products. Data may be in numerical, textual, pictorial or tangible form. Data may be processed, cleaned or in their raw state and stored in any format and on any medium. Open data are part of the broader concept of open science. The European Union has funded a two-year project called FOSTER to help scientists move towards open access. FOSTER is developing teaching materials and courses on open access.

When sharing data, we always follow the principle of "As open as possible, as closed as necessary". It is not always possible to share all data. For example, some outputs may contain personal and other sensitive data, the disclosure of which could compromise, e.g., security, trade secrets, patents, copyrights, etc. However, even such data should be stored securely and made publicly available at least in the form of metadata. The National Technical Library has published a General Recommendation for Metadata Description of Research Outputs and Research Data, which can help authors in describing research data.

Data creators are often unaware that it is appropriate to set conditions for ownership, licensing and reuse of data, which often leads to the data being unusable for other purposes. One way to manage and protect data is through what is called a Data Management Plan. This plan on how research data will be generated, organized, shared and secured is also required by the Horizon Europe project. For more information on open research data storage in Horizon Europe.

An example of open data in science:

Publication of data is also required by some journals (e.g., Nature, The American Naturalist) or publishers (Public Library of Science). Journal data policies can be monitored through the Nature portfolio.

Open data are also promoted on Open Data Day.

DATA SHARING

Shared data should conform to the FAIR principles, which define 4 basic criteria that data should meet. They should be: findable, accessible, interoperable and reusable. These principles were defined as early as 2016 in the article The FAIR Guiding Principles for scientific data management and stewardship.

1. To be Findable

If the data are to be reusable, then we need to ensure that both humans and machines can find them. For this purpose, machine-readable metadata are key.

F1. (meta)data are assigned a unique and persistent identifier (e.g., DOI, handle)
F2. the data are described with sufficient metadata
F3. (meta)data are registered or indexed in searchable sources
F4. metadata specify the identifier

F4. metadata specifikují identifikátor

2. To be Accessible

Data should be made openly accessible, ideally through a repository. If open access to scientific data is not possible, then at least the metadata should be freely available.

A1. (meta)data can be retrieved using their identifiers through standard communication protocols (APIs)
A1.1 the protocol is open, free to use and universally applicable
A1.2 the protocol allows authentication and authorization if necessary
A2. metadata are available even if the data themselves are no longer available

3. To be Interoperable

To integrate with other datasets, it is appropriate to use standardized expressions to describe the data.

I1. (meta)data use a formal, accessible, shared and widely applicable language to represent knowledge
I2. (meta)data are used by dictionaries that follow the FAIR principles
I3. (meta)data contain links to other (meta)data

4. To be Reusable

The primary goal of the FAIR principles is to increase the reusability of scientific data. To achieve this, it is important that data are sufficiently described and shared under an open license (e.g. Creative Commons) so that data users know how the data were created, what they describe and how they can use them.

R1. (meta)data have a number of precise and relevant attributes
R1.1 (meta)data are published under a clear and accessible license
R1.2 (meta)data are linked to their origin
R1.3 (meta)data meet the standards of the scientific community in the given field

OA_DATA

Source: Foster Open Science (Assessing the FAIRness of data | fosteropenscience.eu)

DATA MANAGEMENT PLAN

One of the ways to manage and protect data is the so-called Data Management Plan. This document describes the entire life cycle of research data. It depicts how research data will be generated, organized, shared and secured. It is a continuously updated document that reflects what has actually happened and will happen to the data, not only throughout the research, but also after the research has ended.

The basic questions addressed by the DMP are:

What data will be generated and subsequently collected and stored within the project?
Whether and, if so, how will these data be made available for verification and re-use? If the data cannot be made available, an explanation shall be provided.
What standards will be used to store the data?
How and where will the data be managed and stored?

Benefits of the DMP:

the ability to anticipate potential problems,
reducing the risk of duplicate work, data loss and security breaches,
ensuring data accuracy, completeness and reliability,
assistance in sharing data, improving communication with specific individuals responsible for particular tasks in the process of working with data,
timely assessment of required equipment and support,
ensuring continuity of long-term processes and ensuring increased research integrity in the event of staff changes, etc.

Nowadays, you can use recommended templates or one of the tools for creating a DMP:

General Data Management Template (a bilingual Data Management Plan template based on the Horizon Europe template (model)
Data Stewardship Wizard (a tool from ELIXIR, helps researchers understand what is needed for data management oriented towards FAIR principles, and build their own Data Management Plan)
DMPonline (a tool to support the creation of project DMPs, including their storage)
ARGOS (an online tool for DMPs)

DATA REPOSITORIES

Scientific data are usually stored in data repositories or data journals. Data journals publish peer-reviewed, so-called data articles. Data articles focus on the description of certain freely available datasets, and unlike regular articles, they do not contain any form of interpretation or discussion.

There are disciplinary, institutional or universal data repositories. The best way to find a suitable repository is through the Re3data data repository registry or the OpenDOAR database. When selecting a suitable repository for data storage, a discipline-specific repository should always be preferred as it can offer discipline-specific metadata descriptions and additional features. If such a repository is not available, data can be deposited in an institutional repository or one of the universal repositories such as Zenodo, Dryad or the National Repository. The latter is currently in pilot operation, but should function as one of the main repositories of research data in the Czech Republic in the future.

It is useful to consider whether the selected repository meets the following criteria:

provides open access,
is trustworthy or certified,
assigns a persistent identifier (e.g. DOI),
allows you to have a homepage with metadata,
states the conditions under which the data can be used - grants a license,
allows you to update dataset versions, etc.

Secure data handling

Authors should be cautious when storing, sharing and transferring data. When using portable media, authors should not lend a disk of sensitive data to anyone, leave it freely available, or rely on it as the sole data storage device. When using employee computers, it is advisable to restrict access to a narrow range of users. Caution should also be exercised when using email attachments, cloud environments, shared storage, etc.

It is advisable to secure the data with a strong password that the author does not disclose to unauthorized persons. It is advisable to choose a password that is not used for any other security, and it is advisable to change it if the author believes it has been compromised.

OPEN DATA IN HORIZON EUROPE PROJECTS

Horizon Europe (HE) encourages its beneficiaries to practice the widest possible range of open science principles and tools in their projects. The conditions, which are firmly set out in Article 17 and Annex 5 of the Grant Agreement and which beneficiaries of HE projects must comply with, concern open access not only to peer-reviewed publications but also to research data. Fees or costs associated with data management may be an eligible cost of an HE project.

In addition to adhering to the FAIR Principles, grantees are also required to:

1. Create a data management plan (DMP):

The first version of the DMP is normally submitted by the researchers as deliverable as early as the 6th month of project implementation.
The plan should describe, among other things, what type of data will be generated or used, their organization and management, as well as data access, sharing or possible deletion during and after the project.
The DMP template is available on the FTO portal in the Reference documents section (subsection Templates & forms → Project reporting templates)

2. Store data in a trusted repository:

A suitable repository can be found e.g. through the Re3data data repository registry, the OpenDOAR database of open repositories, or through Repository Finder;
it is also possible to use the Zenodo universal repository;
the chosen repository should provide information or tools that are necessary for any validation of the research data.

3. The data must be made available in the repository in open access mode under a CC BY (or equivalent) license:

As with publications, the metadata for research data must be published under the CC 0 license or its equivalent and should include, as a minimum, information about the dataset, i.e., a description of the data, date and location of the data, authors and embargo; the HE grant; the title of the grant, its acronym and number; the license terms; persistent identifiers of the dataset, authors and, where possible, identifiers of the organizations involved and related publications.