Documentation of research data

Documentation helps to understand the structure and content of the data itself, as well as the context in which the data was created. The documentation of the data takes place at several levels:

  • project-level documentation - main project information, research design, methods, data processing, conclusions and access to data.
  • documentation at the data level - data type, format, size, how specific data was collected, individual measurements and variables.

Data documentation includes:

  • Project information: project/study title, people involved and their roles, etc.
  • Information on methods: methods of data collection and analysis, instruments and programs used, calibration of instruments, etc.
  • Information about the data itself: variable names and definitions, units of measure, etc.
Check out the Data Management Expert Guide from CESSDA.

Data should be documented at all stages of the research data life cycle. Detailed documentation supports reproducibility and integrity of research. The documentation also includes metadata.

vyzkumna-data
Dataedo. Available from: https://dataedo.com/cartoon/data-vs-metadata-2 Cartoons are licensed under a Creative Commons Attribution-NoDerivs 3.0 License.

 

Metadata
= data about data

Metadata is documentation that describes data in a standardized format and is intended for machine reading. Properly describing and documenting data allows research data to be found and reused.

General Recommendation for Metadata Description of Research Outputs and Research Data from The National Technical Library summarizes basic information about metadata description.

Metadata of archived publications

In a repository, metadata records must be publicly accessible and machine-readable in accordance with FAIR principles and must contain at least the following:

  • Document title:
  • the full names of the originators (i.e. authors and other contributors),
  • the date of issue or publication,
  • document type (e.g. article, book, etc.),
  • publisher,
  • the language of the document,
  • information on the availability of publications (e.g. time embargo, licences and other availability data).

 

In addition, it is also recommended to include other information such as:

  • permanent identifiers of the publication (e.g. ISBN, ISSN, DOI, etc.),
  • funding information (funder and project number),
  • permanent identifiers of persons (e.g. ORCID), organisations (e.g. ROR), etc.

The records in the Open repository of research and development results of the Mendel University in Brno contain not only the minimum metadata: https://repozitar.mendelu.cz/ but also the OpenAIRE metadata requirements. The publication metadata is automatically transferred to the repository from the OBD system.

Metadata of archived research data

In a repository, metadata records must be publicly accessible and machine-readable in accordance with FAIR principles and must contain at least the following:

  • dataset name,
  • the full names of the originators (i.e. authors and contributors),
  • the date of (planned) publication,
  • publisher,
  • description of the dataset,
  • information on the availability of data (e.g. time embargo, licences and other availability data).

In addition, it is also recommended to include other information such as:

  • persistent dataset identifiers,
  • funding information (funder and project number),
  • permanent identifiers of persons, organisations, etc,
  • classification by discipline (ideally according to FRASCATI or other),
  • keywords.

The metadata in the data repository is added by the researcher when the research data is uploaded. In addition, we recommend that you also enter domain-specific data that are relevant to your field.

Metadata is often created according to domain standards, you can search for metadata standards according to your domain: https://www.dcc.ac.uk/guidance/standards/metadata

vyzkumna-data
Dataedo. Available from: https://dataedo.com/cartoon/data-vs-metadata-8 Cartoons are licensed under a Creative Commons Attribution-NoDerivs 3.0 License.
How to document data?

For documentation, it always depends on the specific research, possible main points may include:

  • description of the context and conditions of the experiment
  • a description of the method used to collect and process the data, including the tools (equipment and software) used
  • contents of test reports, field reports, laboratory books
  • information needed for data interpretation
  • a description of the data quality assurance measures implemented
  • information on technical standards and calibrations
  • documentation and explanation of parameters, variables, abbreviations and codes, including column headers in data tables
  • specification of the source when using existing data (reference, DOI, URL)
  • documentation of the persons involved and their tasks
  • documentation of the conditions for long-term storage and subsequent use of data (licences, possible restrictions on use of data, duration of embargo, rules for deletion of data)
  • a list of all associated files and folders and a description of their formats and contents
  • links to all publicly accessible data repositories
  • links to publications in which the data are used or cited
  • links to related documents and data files
  • recommended data citation
Where to document the data?

There are several options for data documentation: README files, electronic lab journals, or GitHub. The option you choose is up to you, the important thing is that the data is clearly described and the documentation is accessible together with the data.

  • README files

Create a separate README file that contains basic information about the research data. You can also create a README file for each dataset separately.

vyzkumna-data
Fig. 54 Illustration about managing files in a repository. The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

Start the description from the beginning of the project. Rich and structured information will help to understand the dataset and decide on its content and further usability.

Instructions on how to create a README file.

TIP: if you describe your data in English, your data can be reusable for foreign researchers.

  • Electronic laboratory journals

If you use electronic lab journals, these allow you to document your data. An example is eLabJournal.