Data management processes in research organizations in 2025

Authors: Milla Kortelainen (CSC), Kuisma Lehtonen (CSC), Katriina Karkimo (Haaga-Helia AMK), Maria Söderholm (Syke), Turkka Näppilä (Tampereen yliopisto)

CSC’s Fairdata services, together with experts from other CSC research data management services, organized a series of workshops for eight pilot organizations. The aim was to examine the current state and target state of the organizations’ data governance processes through the lens of the Data Governance Reference Architecture (DAHA), with a particular focus on the Fairdata services. DAHA is a framework established by the Ministry of Education and Culture’s scientific computing and data management cooperation forum. It provides guidance for achieving the target state of research data governance by 2030.

The pilot involved five universities, one university of applied sciences, and two research institutes. In addition to CSC experts, the workshops included Fairdata service contact persons from the organizations as well as specialists from data support and research services. Typically, the work consisted of one full‑day in‑person meeting and 1–2 two‑hour online meetings. Each organization was represented by 3–14 key participants, and CSC by 5–10 experts. The pilot organization and CSC jointly selected the DAHA processes to be examined in each workshop. Usually, fewer than ten processes were covered.

Main objectives of the pilot

  • Identify how the selected DAHA processes are currently implemented and define the target state in collaboration between Fairdata and the organizations.
  • Align service processes and clarify roles, responsibilities, and prerequisites for utilizing services.
  • Identify potential system integration and development needs.

Key insights from the workshops

The workshops revealed that research data governance processes in research organizations are still evolving. Organizations often lack sufficient visibility into ongoing research and related data management practices, partly because researchers may not be obligated to report their activities or data processing. In terms of data governance, research and RDI projects can operate quite independently, there are several different solutions available for storing and processing data, and the researchers’ choices are not always visible at the organizational level. As a result, research data may remain unused, does not fulfill the FAIR principles as it is not findable or reusable, and post‑research preservation may be difficult. Strengthening organizational governance through clearly defined processes supports controlled data lifecycles and eases researchers’ work by providing ready‑made solutions and operating models for different data governance use cases

Key challenges

Central challenges in data management include the responsibility placed on researchers to manage the data lifecycle, lack of a leadership-level support and guidance, and insufficient awareness of existing common national data management services within organizations. Researchers’ knowledge of data management principles is often limited, and resources are scarce, which means data management is not always addressed at the right time. This complicates long‑term preservation and re-usability after a project ends, particularly when contractual issues arise, or essential preservation‑related information is missing.

Increasing awareness of existing services would help organizations guide researchers toward the right tools at each stage of the research lifecycle. The pilots helped several organisations to perceive the role of CSC services and their integration as part of their own data management practices, which in the future will promote the awareness of services among researchers.

Processes, automation, and technical support

Clear processes, guidance, and incentives strengthen researchers’ ability to manage data appropriately. Technical solutions can then support and streamline workflows through automation. Key enablers include process automation, system‑to‑system integrations, smooth data flows, and highly automated workflows and decision-making. As part of this, the machine‑actionable Data Management Plan (maDMP) is used to collect research information at decision points throughout the research lifecycle. Encouraging researchers to describe their data in accordance with common practices at early stage, findability and interoperability with the publishing systems is ensured.

Alongside technical solutions, practical training, support personnel, and clear instructions are crucial for smooth operations. Decision models and proactive measures enable transferring data into long‑term preservation (DPS) and simplify data evaluation. The workshops also highlighted that CSC’s Sensitive Data (SD) services are well suited for handling sensitive datasets, but require further guidance, communication, and support. National services bring significant added value by enabling interoperability, shared data models, automation, cost‑efficiency, and supporting risk management, organizational control, visibility, and impact.

Role of leadership

Possible insufficient leadership‑level understanding of current state of data management practices and the target state makes it difficult to recognize benefits and allocate resources. Leadership support is essential for strengthening organizational control and embedding data management practices into the research process. Developing data management requires clear organizational roles and responsibilities so that researchers can be expected to provide the necessary information about research data at lifecycle decision points. This ensures that contractual obligations, dataset characteristics, openness, and evaluation are handled systematically, enabling preservation and reuse.

Well,‑defined current and target states of data management processes provide an overview of research activities and future needs. They support foresight in skills development, staffing, and procurements and help establish responsibilities and practices for different situations.

Systematic data management also gives organizations visibility into ongoing research, resource needs, and future requirements. With better predictability, organizations can prepare for capability building, skills development, staffing, investments, and funding.

Follow-up actions

Each organization received a bilateral report summarizing the workshop outcomes and development recommendations, as well as a public summary of the pilot (only in Finnish).

Recommended next steps for organizations include:

  • Commit to designing data management processes with leadership support.
  • Clarify and agree on internal roles and responsibilities in data management.
  • Strengthen data management training and support structures.
  • Promote cultural change so that data management becomes a supported part of research.

Concrete follow‑up actions include:

  • Automating data management processes.
  • Adopting and integrating maDMP.
  • Developing service integrations and machine‑readable service catalogs.
  • Establishing national guidelines for research data valuation.
  • Starting DPS usage with the “easiest” research datasets.

The workshops also suggested that CSC should continue supporting organizations in integrating services into their processes, describing the researcher service pathway, producing targeted documentation, strengthening service branding within organizational solutions, automating its own data management processes in collaboration with organizations, and facilitating national dialogue in data management.

Experiences from individual pilot organizations

The workshops accelerated Haaga‑Helia University of Applied Sciences’s ongoing adoption of the DAHA reference architecture, particularly by clarifying which services are needed in different data management processes. “With CSC’s support, we were able to define how Haaga‑Helia’s own data storage solutions integrate with CSC’s services and which service combinations best suit different use cases.” The pilot led Haaga‑Helia to adopt the Qvain tool for dataset descriptions, and progress was made towards adopting CSC’s Sensitive Data services. More detailed process descriptions and support services will help integrate Fairdata tools more effectively into project workflows in the future.

Tampere University emphasized that the in-person workshop provided a valuable opportunity for experts from the university and CSC to meet and build a stronger foundation for ongoing dialogue. The workshop offered a chance to explain and justify the university’s own data governance needs. Tampere has been documenting the current state of research data governance since autumn 2025, and the workshop supported this work by clarifying how the DAHA reference architecture can be applied. It also deepened understanding of CSC’s tools and offered practical guidance on their use.

At the Finnish Environment Institute (Syke), the timing of the development pilot was excellent, as the work to reform data policy and (meta)data management services was ongoing. Review of DAHA reference architecture processes Syke’s own starting points helped to form a more systematic overall picture of the current state and development needs of data governance, thus supporting ongoing policy work. During the pilot, detailed information on CSC’s services and their utilization possibilities was also obtained. This is particularly relevant to Syke, where a large part of the environmental data is produced not only in research projects but also in connection with monitoring activities. It was also important for Syke that the pilot promoted internal discussion about the role of CSC services as part of Syke’s own (meta)data distribution and publishing channels and directed attention to interoperability with national services in developing Syke’s services.

Looking ahead

National efforts to support data governance processes are already underway. The Fairdata co‑development group is working with research organizations to create national guidelines for data valuation and to develop the Metadata Harvester tool, which provides an organizational‑level overview of published research datasets worldwide.

More workshops will be organized in autumn 2026, with registration opening in spring. The goal is to finalize scheduling by late spring. Organizations without a representative in the Fairdata co‑development group or those interested in joining the Fairdata network are encouraged to contact the Fairdata services via email fairdata@csc.fi.