Data management checklist

How to take care of your research data?

In order to make research more transparent, verifiable, replicable or reproducible, one should be able to provide the data supporting the findings. Many funding bodies nowadays require that a research proposal must also include a data management plan, and in many cases submitting the data to a trusted repository is required. However, not all data can be shared, in which case the best practice is to share at least the metadata – information about the data. Publishing research data is a relatively new addition to the research process and the practices are still developing. Services and tools that conform to the FAIR principles, such as the Fairdata services provided by the Finnish Ministry of Education and Culture, help bring transparency and reliability into the changing environment.

Data collection requires a lot of effort, and it is an invaluable part of the research process. Making data available can prevent duplication of efforts and save time and resources. Furthermore, the value of data as a form of research output equal to publications is becoming better understood in academia. The shift can be observed for example in the DORA declaration, which calls for the development of more accurate and responsible metrics to measure impact of scientific research outputs, including recognising and rewarding the contribution of data creators and curators. In Fairdata services, the datasets are assigned a persistent identifier, which facilitates findability and enables unambiguous data citation. Data reuse and citation increase the impact and visibility of the research, the researcher and their home institution; therefore, good research data management can bring added value to the whole research community.

Research data management (RDM) includes many aspects, e.g. the planning of data collection or generation, organising data, documentation and description, storage, version control, and decisions about archiving and preservation after the conclusion of the research project, such as what data should be discarded, what should be kept and what can be shared. It is beneficial to plan how you are going to manage your data at the very start of your research project, and the plan should cover the whole research data lifecycle:

Source: UK Data Services

1. The planning phase

Develop a data management plan (DMP). There are tools to guide you through the planning process, such as DMPTuuli for the Finnish research community, which includes templates and guidance based on the requirements of various funding bodies and research institutions (e.g. Academy of Finland). Similar tools in the international context are for example DMPOnline or DMPTool.

The following questions can help you plan your research data management:

What type of data will you collect and what methods are you going to use? Could you maybe make use of existing published datasets?
Are you applying for funding? → Check the funding body requirements (e.g. in the Sherpa Juliet service)
Does your institution have a data policy? → Check your institution’s guidelines and requirements
Will you be collecting sensitive data? → Read the guidelines on Anonymisation and Personal Data (Data Management Guidelines by the Finnish Social Science Data Archive). This CSC webinar about sensitive data explains what is sensitive data and how it should be handled.
Does the nature of your project require risk assessment or ethical review? → Check the guidelines by the Finnish National Board on Research Integrity, or contact your institutional or discipline specific ethical advisory board. If applicable, plan also how you are going to inform the research participants and obtain their consent (for more information see for example the Data Management Guidelines).
Plan and agree upon the ownership, intellectual property rights and access rights. It is important to plan ahead with your collaborators and have a clear agreement on who is the owner of the data, who controls the access and privileges, who is going to be allowed to access, manipulate and modify the data. Make also the necessary decisions about authorship and intellectual property rights. More practical tips can be found in the Data Management Guidelines or in these recommendations by the Finnish National Board on Research Integrity.
Identify the steps that will have to be taken to manage the data in each phase of its life cycle and who will be responsible for those steps.

To identify and plan the necessary steps, you can start by reading about the procedures and problems associated with each part of the data life cycle in parts 2.-4. of this checklist: how are you going to handle, store, use or share the data? The most important questions have been compiled by DCC into this Data management checklist flyer. The How fair are your data? checklist by Sarah Jones and Marjan Grootveld will help you evaluate if your data management plan conforms to the FAIR principles. It is normal to be unsure and not every single detail can be decided upon before the project has even started, therefore it is acceptable and advisable to review and update your data management plan throughout the course of your research project.

To learn more about the structure and the benefits of developing a data management plan, see this video titled “The what, why and how of research data management” created by Research Data Netherlands.

2. During the research

Data collection and management practices are discipline specific and depend largely on the type and specifications of the data. Some basic advice to follow is:

Create detailed documentation throughout the course of your research – having to go back and fill in the gaps in documentation afterward is laborious and complicated, if not impossible. Sufficient documentation of the data, information resources, and the methods and codes used in the analyses make the reuse and reproducibility of research possible. Document also the changes performed while collecting, organising and analysing the data. The provenance (record trail of the origin and changes made to the data) should be transparent. Good documentation will benefit you too: you will have a detailed record of your procedure and description of the data available if you want to reuse your data or need to answer any questions about it in the future.
Use the research support services available at your institution. At most universities, the library offers data support services, or there might even be a research data management specialist at your department. Some issues can be resolved with the help of your institution’s Legal Services, or you could contact the Research Integrity Adviser.
If you want to learn more about research data management, there are many online courses and materials available. Some examples are:
- Data Management Guidelines by the Finnish Social Science Data Archive
- MANTRA (Research Data Management Training): online course developed at the University of Edinburgh
- Responsible Research: Guide to Research Integrity, Research Ethics and Scientific Communication in Finland
- Introduction to data management (YouTube), short introduction to data management and FAIR data by CSC
- The “Love your data!” webinar series by CSC

More tips on good practice are available in this guide by CSC or this guide by UK Data Service. The following are some of the specific issues you should take into consideration when preparing your data management plan:

Are your data intelligible and reusable?

Metadata

Safe and secure storage and sharing during the research project

3. Data deposition

While drawing up your data management plan, consider also what happens to the data after the research is concluded and findings possibly published. What data should be retained and where are you going to deposit it? What can be deleted or what should be destroyed (perhaps some parts of sensitive personal data)? What should be prepared for long term digital preservation? These are some basic points to take into account:

The cost and amount of work required to prepare the data for deposition
Funder requirements
If the results have been published, familiarise yourself with the terms, recommendations or requirements of the publisher. There might be an embargo on data sharing, other publishers might require that the underlying data are also made available (check for example the data policy of PLOS journals).
Whenever possible, link the various outputs produced in course of the research, for example if there are reasons to submit parts of the data output into another repository. It is also useful to link the publication record with the landing page of the underlying data. Persistent identifiers are invaluable in creating reliable links.

Even if your research leads to negative findings, consider making the data available anyway. Publishing these findings could help avoid duplication of effort, contribute to the discussion, or the data could be used in meta-analysis.

Where to deposit the data?

You can use a service that is suitable for storage during the research and subsequent publishing of the stable, immutable version of the data. You can also use one option for the active data storage where you can process and modify the data, and another option to deposit the immutable data afterward. In any case there are steps that need to be taken before the data can be made available – they have to be logically and systematically organised, documented, and there needs to be sufficient metadata. Remove or destroy files that are not to be made available. Make sure that personal data are adequately anonymised.

What type of service is suitable for your data depends on factors such as the requirements of the funding body or your institution, and common practice in your academic field. You can choose from institutional repositories, international general data repositories (e.g. Zenodo), subject or domain specific data archives (e.g. Dryad, Genbank), data type specific services (such as Github for software), or national repositories (Fairdata-IDA). Data journals are a new format of peer-reviewed academic publications that specialise in publishing research data in the form of data article (see for example Brain and Behavior, or Geoscience Data Journal).

Research data storage service IDA offers collaborative storage space for a defined group of users. IDA is suitable for active storage during the research, for sharing among the group members as well as storing stable data in an immutable state. The user can select data to be frozen – i.e. stored in an immutable state, which enables publishing the data. Prior to publishing the data, the user adds metadata to their frozen data with the Qvain metadata tool. The data described with Qvain will get a persistent identifier and a landing page once they are published. The published dataset is discoverable in the Etsin research data finder. Access to the data can be set as open or restricted, and it is also possible to only publish the metadata. The Fairdata Services can be found in the Registry of Research Data Repositories, a database of trusted research data services. If you are considering using Fairdata services for data management, take into account the criteria mentioned above. Note that data protection issues also affect the selection of storage service: IDA has not been designed for the storage of sensitive personal data. For sensitive data, CSC provides a separate family of SD services.

Digital preservation

4. Data sharing and reuse

The data life cycle doesn’t end after the findings have been published. When you are drawing up your research management plan and considering if you should make your data available for reuse, the rule of thumb is to make the data “as open as possible, as closed as necessary”. You will need to weigh in factors such as IPR and ethical issues or licences applied to the data or the publication based on them.

Licences and access rights

Data citation and persistent identifier (PID)

Find data for reuse

More information for organisations

Do you work for an organisation such as a research institution, academic publisher or a funding body, and want to learn more about how your organisation can promote good data management? You might be interested in the report linked below, compiled by the Knowledge Exchange collaboration. Recommendations for various types of organisations involved in the research community can be found on pages A4-A7, and page A11 summarises the factors that encourage or hinder data sharing.

Incentives and motivations for sharing research data: a researcher’s perspective