When and what data you should share publicly?

Whether open data sharing is a requirement in your project, or perhaps when submitting a publication to a specific publisher, you always need to consider what data needs to be/can be released, when to open the data and where and how to do it. The issue of data sharing can arise at any time during your research and information about when and what data you will share should be captured in your data management plan.

Although each publisher or provider may have its specific conditions for sharing data open in general, at the time of publication of the scientific output all data necessary for replication of the study results must be released without restriction. If there are legal or ethical barriers to such data sharing, authors must indicate this in the Data Availability Statement and mention where the data can be accessed.

When to share research data during research/publication

There are several time frames for open sharing of the research data:

1. Share real-time data as your research progresses.

Sharing data as it is being collected is not very common in scientific practice. This is due to concerns about providing an advantage and the possibility of other institutions and researchers mining your research data. The exception to this is in the case of sudden events or global problems, such as pandemics, where shared data can greatly help the public.

2. Share data immediately when you submit the manuscript to which it is associated for review in a peer-reviewed journal.

Making the data available to reviewers will allow them to examine your work more deeply and demonstrate the richness of your research. Sharing data at this stage may be a publisher-specific requirement, so we always recommend that you check the individual data-sharing policies of academic publishers.

3. Share the data at the same time as the underlying book or article is published.

This is probably the most commonly used method. Again, this may be a requirement of the publisher you are publishing with. It may also be one of the conditions in your project. Alternatively, you may simply want to provide readers with background material on your work.

4. Share data after the time embargo period has expired

A time embargo can be imposed on data for various reasons. You should always describe reasons for imposing the time embargo in the DMP (even if it is only intended) so that you can meet the open data sharing requirement while protecting the data for a limited time for various reasons. Similarly, the time embargo must be stated in the Data Availability Statement.

Reasons for restricting access to data

Not all data can be shared openly, most often due to legal or ethical barriers. Therefore, there are acceptable situations for setting restrictions on data access. If anything prevents the release of data, or if data must be released in a restricted mode (e.g., time embargo), authors must clearly state these restrictions in the data availability statement at the time of submission of the publication/research results. Acceptable restrictions on the public sharing of data include:

  • Third-party data

For studies involving third-party data, it is recommended to share all data specific to the analyses performed in the research, as long as their dissemination is legally possible. If third-party data have been used that the researchers do not have the right to share, the authors must provide all the information necessary to allow interested researchers to request access to the data.

  • Data on human subjects and other sensitive data

For studies involving human subjects data or other sensitive data (health records, personal information, etc.), authors can publicly share only anonymised data. However, if the data cannot be shared publicly or fully anonymized, this barrier to open data sharing must be stated in the DMP as well as the Data Availability Statement

  • Protection of commercial intent, competitive advantage and patents

In the case of a research collaboration with the private sector, where the research data used to produce a publication/scientific output could harm the commercial and business intentions of the partner, then the relevant reason is to publish the research data with a restriction, e.g., a time embargo, where the embargo will be removed once the potential threat has expired. The same is the case where open sharing of research data could jeopardise, for example, the success of a patent application. Again, in some cases, research data just cannot be openly disclosed. Any obstacles to open data sharing must be stated in the DMP as well as in the Data Availability Statement.

What data needs to be shared

In most cases, whether publishing or releasing data after a project is completed, open data sharing is approached by sharing a 'minimum dataset'. The minimum dataset consists of the data needed to replicate all the results of your research or publication and includes associated metadata and methods. It is also good practice to follow discipline-specific standards for data preparation, documentation and storage.

Examples of data that should be included in a minimum data set are:
  • values of means, standard deviations and other reported measures;
  • the values used to construct the graphs;
  • points extracted from images for analysis

Raw or processed data?

As for the nature of openly shared data, it depends on established discipline practices as well as the conditions of the publisher. Authors do not have to submit the entire dataset if only part of the data was used in the reported study. . Authors also do not have to submit raw data collected during the survey if it is standard in the discipline to share data that has been processed.

In other cases, publishers request the sharing of the raw data on which the scientific publication is based. This is because raw data includes the individual data points or smallest units of information on which the research is based. Already processed data, such as averages and percentages, can only be re-analysed using limited methods. Furthermore, outliers and missing data cannot be read from processed data, so only partial verification of scientific results can occur.

Data Availability Statement

Authors are encouraged to include a Data Availability Statement (DAS) in all articles that report results obtained from research data. The statement should include information on where to find data supporting the results presented in the article, including hyperlinks to publicly available datasets uploaded to the data repository, including persistent identifiers where appropriate. If the research data are not publicly available, this must be stated, together with any reasons for restricting access and the conditions for accessing the data.

If you are submitting a manuscript to a peer-reviewed journal that has terms and conditions for sharing research data, you will likely be asked to include the data availability statement directly in your manuscript. These statements are intended to make the data more findable and accessible.

Text examples for the Data Availability Statement:

  • Data sets generated and/or analysed during this study are available in the data repository [NAME], [DATASET IDENTIFIER AND DATASET LINK].
  • Data sets generated during and/or analysed during this study are not publicly available due to [REASON WHY DATA IS NOT PUBLIC] but are available from the corresponding author upon reasonable request.
  • Data files generated during and/or analysed during this study are available from the responsible author upon reasonable request.
  • Data sharing does not apply to this article because no data sets were created or analysed during this study.

For the location of the Data Availability Statement section in the manuscript itself, we recommend following the terms and conditions of each publisher. In most cases, the Data Availability Statement (DAS) appears in the manuscript file under the heading "Data Availability Statement" s part of the final section of the manuscript, for example, before the "Reference"section.

Where to share your research data?

Save to a data repository (strongly recommended)

All data and associated metadata on which your research results are based should be deposited in an appropriate data repository. Repositories can be either discipline-specific or cross-disciplinary generic. More information about data repositories can be found here:

Data Note - article about data

If you have uploaded your research data to a data repository, your efforts can be rewarded not only by greater transparency and credibility of your research but also by possible citations. Additionally, you may choose to write a so-called Data Note.

What is a Data Note?

A Data Note refers to a short peer-reviewed article that briefly describes the research data stored in a data repository. It increases the visibility and transparency of your research, helps meet funders' requirements for open data sharing, and ensures that your data is FAIR (findable, accessible, interoperable, and reusable).

Data Notes typically do not contain any analysis or conclusions but can be linked to a research paper that includes an analysis of the published dataset as well as other research outputs. They may also highlight separate datasets stored in a data repository, for example if the dataset did not lead to a publication.

What are the benefits of the Data Note edition?

✔ maximise the potential of your research data by improving its traceability, usability and reproducibility

✔ gaining appropriate recognition for your research data through a citable publication

✔ reaching new audiences for your research

✔ fostering new collaborations across disciplines by making your data accessible and descriptive

How do I write a Data Note?

Data Notes must describe the research data that the authors created and own and should include:

  • justification of the dataset, protocol and validation details
  • information about any limitations of the dataset
  • information on where and how to access the dataset, as part of the data availability statement
  • reference to the dataset by formal citation, persistent identifier, link
  • where appropriate, provide citations and summaries of any previous publications that use the published data
Data Notes templates

Here are some examples where you can download templates to create Data Notes:

Sharing research code

If you have created new code during your research; perhaps as a direct output of your work or as a tool to help you analyse the data you have collected, you can also share it openly in a data repository. You should include open code sharing in your data management plan, especially if the code you have created is needed for others to validate your results.

Why share code publicly?

It is increasingly common for researchers and developers to share code they have created during their research or in their projects. As with sharing other forms of data, there are many benefits to making your code available, including

  • Gaining recognition and citations for a type of research output that often remains in the background.
  • Better discoverability of your research projects.
  • Building trust in your research by aiding its transparency and reproducibility.
  • Enabling other researchers to reuse your code and build on it.
The requirement to share research code is becoming more common

For example, publisher Springer Nature has unified open code policy to support the open sharing of science. An integral part of this policy is to encourage authors to publicly share code used in primary research, included in books/chapters, as well as newly developed code in original research articles. The Code Availability section directly in the publication will be listed for all original research where the authors have declared the development of new code necessary to interpret and replicate the findings. Some journals will also require code sharing during the review process.

Research data sharing FAQs

Do I have to share all my research data?

Can I restrict access to my research data after it has been published?

How do I inform the funder of potential problems with data sharing?

What is a data availability statement?

Do you have any text templates to inform the availability of research data that I could include in a publication?

How do I submit a research data availability statement to the publisher?

How can I link my data to my published journal article?

What if someone uses my data in an unauthorised way?

How can I find out if I need to share data?

Will my data be automatically linked to my publication?

At what point in the submission process should the data be made public?

Do I have to share all the data? Raw and edited?

Research data sharing policies of individual academic publishers

Recommended

References