When and what data you should share publicly?
Whether open data sharing is a requirement in your project, or perhaps when submitting a publication to a specific publisher, you always need to consider what data needs to be/can be released, when to open the data and where and how to do it. The issue of data sharing can arise at any time during your research and information about when and what data you will share should be captured in your data management plan.
Although each publisher or provider may have its specific conditions for sharing data open in general, at the time of publication of the scientific output all data necessary for replication of the study results must be released without restriction. If there are legal or ethical barriers to such data sharing, authors must indicate this in the Data Availability Statement and mention where the data can be accessed.
When to share research data during research/publication
There are several time frames for open sharing of the research data:
1. Share real-time data as your research progresses.
Sharing data as it is being collected is not very common in scientific practice. This is due to concerns about providing an advantage and the possibility of other institutions and researchers mining your research data. The exception to this is in the case of sudden events or global problems, such as pandemics, where shared data can greatly help the public.
2. Share data immediately when you submit the manuscript to which it is associated for review in a peer-reviewed journal.
Making the data available to reviewers will allow them to examine your work more deeply and demonstrate the richness of your research. Sharing data at this stage may be a publisher-specific requirement, so we always recommend that you check the individual data-sharing policies of academic publishers.
3. Share the data at the same time as the underlying book or article is published.
This is probably the most commonly used method. Again, this may be a requirement of the publisher you are publishing with. It may also be one of the conditions in your project. Alternatively, you may simply want to provide readers with background material on your work.
4. Share data after the time embargo period has expired
A time embargo can be imposed on data for various reasons. You should always describe reasons for imposing the time embargo in the DMP (even if it is only intended) so that you can meet the open data sharing requirement while protecting the data for a limited time for various reasons. Similarly, the time embargo must be stated in the Data Availability Statement.
Reasons for restricting access to data
Not all data can be shared openly, most often due to legal or ethical barriers. Therefore, there are acceptable situations for setting restrictions on data access. If anything prevents the release of data, or if data must be released in a restricted mode (e.g., time embargo), authors must clearly state these restrictions in the data availability statement at the time of submission of the publication/research results. Acceptable restrictions on the public sharing of data include:
-
Third-party data
For studies involving third-party data, it is recommended to share all data specific to the analyses performed in the research, as long as their dissemination is legally possible. If third-party data have been used that the researchers do not have the right to share, the authors must provide all the information necessary to allow interested researchers to request access to the data.
-
Data on human subjects and other sensitive data
For studies involving human subjects data or other sensitive data (health records, personal information, etc.), authors can publicly share only anonymised data. However, if the data cannot be shared publicly or fully anonymized, this barrier to open data sharing must be stated in the DMP as well as the Data Availability Statement
-
Protection of commercial intent, competitive advantage and patents
In the case of a research collaboration with the private sector, where the research data used to produce a publication/scientific output could harm the commercial and business intentions of the partner, then the relevant reason is to publish the research data with a restriction, e.g., a time embargo, where the embargo will be removed once the potential threat has expired. The same is the case where open sharing of research data could jeopardise, for example, the success of a patent application. Again, in some cases, research data just cannot be openly disclosed. Any obstacles to open data sharing must be stated in the DMP as well as in the Data Availability Statement.
What data needs to be shared
In most cases, whether publishing or releasing data after a project is completed, open data sharing is approached by sharing a 'minimum dataset'. The minimum dataset consists of the data needed to replicate all the results of your research or publication and includes associated metadata and methods. It is also good practice to follow discipline-specific standards for data preparation, documentation and storage.
- values of means, standard deviations and other reported measures;
- the values used to construct the graphs;
- points extracted from images for analysis
Raw or processed data?
As for the nature of openly shared data, it depends on established discipline practices as well as the conditions of the publisher. Authors do not have to submit the entire dataset if only part of the data was used in the reported study. . Authors also do not have to submit raw data collected during the survey if it is standard in the discipline to share data that has been processed.
In other cases, publishers request the sharing of the raw data on which the scientific publication is based. This is because raw data includes the individual data points or smallest units of information on which the research is based. Already processed data, such as averages and percentages, can only be re-analysed using limited methods. Furthermore, outliers and missing data cannot be read from processed data, so only partial verification of scientific results can occur.
Data Availability Statement
Authors are encouraged to include a Data Availability Statement (DAS) in all articles that report results obtained from research data. The statement should include information on where to find data supporting the results presented in the article, including hyperlinks to publicly available datasets uploaded to the data repository, including persistent identifiers where appropriate. If the research data are not publicly available, this must be stated, together with any reasons for restricting access and the conditions for accessing the data.
If you are submitting a manuscript to a peer-reviewed journal that has terms and conditions for sharing research data, you will likely be asked to include the data availability statement directly in your manuscript. These statements are intended to make the data more findable and accessible.
Text examples for the Data Availability Statement:
- Data sets generated and/or analysed during this study are available in the data repository [NAME], [DATASET IDENTIFIER AND DATASET LINK].
- Data sets generated during and/or analysed during this study are not publicly available due to [REASON WHY DATA IS NOT PUBLIC] but are available from the corresponding author upon reasonable request.
- Data files generated during and/or analysed during this study are available from the responsible author upon reasonable request.
- Data sharing does not apply to this article because no data sets were created or analysed during this study.
For the location of the Data Availability Statement section in the manuscript itself, we recommend following the terms and conditions of each publisher. In most cases, the Data Availability Statement (DAS) appears in the manuscript file under the heading "Data Availability Statement" s part of the final section of the manuscript, for example, before the "Reference"section.
Where to share your research data?
Save to a data repository (strongly recommended)
All data and associated metadata on which your research results are based should be deposited in an appropriate data repository. Repositories can be either discipline-specific or cross-disciplinary generic. More information about data repositories can be found here:
Data Note - article about data
If you have uploaded your research data to a data repository, your efforts can be rewarded not only by greater transparency and credibility of your research but also by possible citations. Additionally, you may choose to write a so-called Data Note.
What is a Data Note?A Data Note refers to a short peer-reviewed article that briefly describes the research data stored in a data repository. It increases the visibility and transparency of your research, helps meet funders' requirements for open data sharing, and ensures that your data is FAIR (findable, accessible, interoperable, and reusable).
Data Notes typically do not contain any analysis or conclusions but can be linked to a research paper that includes an analysis of the published dataset as well as other research outputs. They may also highlight separate datasets stored in a data repository, for example if the dataset did not lead to a publication.
✔ maximise the potential of your research data by improving its traceability, usability and reproducibility
✔ gaining appropriate recognition for your research data through a citable publication
✔ reaching new audiences for your research
✔ fostering new collaborations across disciplines by making your data accessible and descriptive
Data Notes must describe the research data that the authors created and own and should include:
- justification of the dataset, protocol and validation details
- information about any limitations of the dataset
- information on where and how to access the dataset, as part of the data availability statement
- reference to the dataset by formal citation, persistent identifier, link
- where appropriate, provide citations and summaries of any previous publications that use the published data
Here are some examples where you can download templates to create Data Notes:
Sharing research code
If you have created new code during your research; perhaps as a direct output of your work or as a tool to help you analyse the data you have collected, you can also share it openly in a data repository. You should include open code sharing in your data management plan, especially if the code you have created is needed for others to validate your results.
It is increasingly common for researchers and developers to share code they have created during their research or in their projects. As with sharing other forms of data, there are many benefits to making your code available, including
- Gaining recognition and citations for a type of research output that often remains in the background.
- Better discoverability of your research projects.
- Building trust in your research by aiding its transparency and reproducibility.
- Enabling other researchers to reuse your code and build on it.
For example, publisher Springer Nature has unified open code policy to support the open sharing of science. An integral part of this policy is to encourage authors to publicly share code used in primary research, included in books/chapters, as well as newly developed code in original research articles. The Code Availability section directly in the publication will be listed for all original research where the authors have declared the development of new code necessary to interpret and replicate the findings. Some journals will also require code sharing during the review process.
Research data sharing FAQs
Research data sharing policies of individual academic publishers
Recommended
- Soranno, Patricia A. 2019. Six Simple Steps to Share Your Data When Publishing Research Articles. Limnology and Oceanography Bulletin. 28(2): 41-81. https://aslopubs.onlinelibrary.wiley.com/doi/full/10.1002/lob.10303
References
- Elsevier. 2024. Sharing Research Data. [cit. 2024-03-13]. https://www.elsevier.com/researcher/author/tools-and-resources/research-data
- F1000. 2023. The do’s and don’ts of data sharing: passing our open data checks. [cit. 2024-03-13]. https://www.f1000.com/researcher_blog/pass-our-open-data-checks/
- Plos One. 2024. Data Availability. [cit. 2024-03-13]. https://journals.plos.org/plosone/s/data-availability
- Sage Publishing. 2024. Research Data Sharing FAQs. [cit. 2024-03-13]. https://us.sagepub.com/en-us/nam/research-data-sharing-faqs
- Sage Publishing. 2024. Research Data Sharing Policies [cit. 2024-03-13]. https://us.sagepub.com/en-us/nam/research-data-sharing-policies
- Social Science Research Council. 2024. When, How, and Where to Share Data. [cit. 2024-03-13]. https://managing-qualitative-data.org/modules/3/d/
- Springer Nature. 2023. Research Data Policy. [cit. 2024-03-13]. https://www.springernature.com/gp/authors/research-data-policy
- Taylor & Francis. 2024. Frequently Asked Questions. [cit. 2024-03-13]. https://authorservices.taylorandfrancis.com/frequently-asked-questions/data-sharing-faq/
- Taylor & Francis. 2024. How to share your data. [cit. 2024-03-13]. https://authorservices.taylorandfrancis.com/data-sharing/share-your-data/
- Taylor & Francis. 2024. Sharing your research code [cit. 2024-03-13]. https://authorservices.taylorandfrancis.com/data-sharing/share-your-data/share-code/
- Taylor & Francis. 2024. Understanding our data sharing policies. [cit. 2024-03-13]. https://authorservices.taylorandfrancis.com/data-sharing-policies/