- Fairdata Services in General
- IDA – research data storage
- Etsin – Reseach Data Finder
- Qvain – Research Dataset Metadata Tool
- Metax – Metadata Warehouse
- Digital Preservation Service for Research Data
1.Fairdata Services in General
Taking care of your research data is an essential part of good scientific practice. Fairdata services are made to ease this task.
The Fairdata Services are integrated services for storing, sharing and publishing research data. The Fairdata service components are IDA for storing research data, Qvain (and it’s Light-version) for creating metadata descriptions for datasets, the metadata warehouse Metax, the research data finder Etsin and the Digital Preservation Service for Research Data. The services are offered free of charge to Finnish universities, universities of applied sciences, state research institutes and for those funded by the Academy of Finland. Digital preservation service is based on agreement.
The services are not tied to a specific field of science. The storage space in IDA can be shared among the project members, enabling e.g. collaborative projects to utilise a shared storage space not tied to a single organisation. Data stored in IDA can be linked to dataset descriptions using the Qvain metadata tools or Metax end user API and published in Etsin. Published data can be found by others as it has a persistent identifier enabling linking from e.g. related publications. Read more about the service components.
The data and information stored in the services is stored in Finland. Research funders such as the Academy of Finland recommend and instruct the use of Fairdata services. The Fairdata services are provided by the Ministry of Education and Culture and produced by CSC – IT Center for Science Ltd.
Read more about the benefits of the Fairdata services.
The renewed IDA, Metax and Etsin are already in production. Qvain deployment is delayed, but will be put to production in the beginning of July 2019. The new Etsin and the Digital Preservation Service for Research Data services can be fully utilized after Qvain is functional. During this transition phase, files stored in IDA cannot be linked to new datasets in old Etsin, but can be shared through temporary share links. Advanced users can use also Metax end user API to create dataset descriptions.
The services are available to researchers in all higher education institutions and state research institutes in Finland. The services are offered free of charge for their users and they’re dedicated for research purposes. Read more about the use policies (in Finnish).
Dedicated contact persons in user institutes support their end users and act as contact points toward CSC Fairdata service support.
To use the IDA storage service the end user needs to create or be a member of a CSC project with a IDA storage space allocation. Usually the researcher’s home organisation’s IDA contact person grants the project the IDA storage space, but IDA space can also be granted based on an Academy of Finland funding decision. Read more about how to become an IDA user.
The Digital Preservation Service for Research Data is meant for the digital preservation of research datasets (data, publications, code, learning materials etc.). For Digital preservation service, contact the Ministry of Education to get quota. Read more about Digital Preservation Service for Research Data deployments (in Finnish).
In brief: If you want to use only Qvain or Metax API to create metadata descriptions, self-registering the CSC customer account online is sufficient, and no additional measures are needed to start using the services. Self-registration can be done in a couple of minutes. In addition to creating a CSC customer account, IDA requires either creating or joining project group and an online IDA storage space application. Etsin is open for everyone without registration.
Longer explanation: Registration as a CSC customer account is required to use IDA, Qvain and Metax API. If your home organisation is a member of the Haka federation, you can simply register as a CSC customer in CSC Customer Portal (SUI) with your own Haka-ID. If you don’t have a Haka ID provided by your home organisation, you can apply for a CSC account (e.g. some state research institute customers). After registration, logging in to the services is possible with either your Haka ID or your CSC customer account.
To use IDA for storing or publishing research data (files), you need to have a CSC customer account as described above. In addition to this, you need to be invited to an existing CSC project, or create a new CSC project for which the IDA storage quota can to be applied to. All this can be done online in the CSC customer portal. The IDA storage quota is available after your application has been approved by your organisation’s IDA contact person (usually within a few working days). Detailed instructions can be found at https://www.fairdata.fi/en/ida/becoming-an-ida-user/. If your home organisation doesn’t have a named IDA contact person yet, please contact your home organisation’s research services before applying for IDA space.
The dataset metadata published in Etsin are open for all without registration. If you’re interested in the Digital Preservation Service for Research Data, please contact your home organisation’s research services.
Every project in IDA has a responsible Project Manager, who can apply for a project membership for a collaborator abroad: https://research.csc.fi/accounts-for-researchers-working-abroad. All members of a project that uses IDA can create dataset descriptions using Qvain or Metax API.
A persistent identifier or PID is a unique and unambiguous machine readable name for an object, in this case a specific research dataset. It is also a permanent link, that will always take you to the landing page of the dataset, where the description and for example the license of the dataset can be found. Usually a PID is a DOI or a URN, identifiers provided by two different systems and they can be recognized by the first letters as either.
When your dataset has a PID, which it is allocated by the Fairdata services, you can use it in data citation. All citations can be traced back to you and the link will always take you to a landing page, even if the data is not available or the services have moved or changed over time. Using persistent identifiers is one of the corner stones in the FAIR principles.
If the dataset is available from a source that is outside the Fairdata services, you can create a dataset using Qvain and get a URN for the landing page in Etsin. When the data is outside Fairdata services, you are responsible for the integrity of the data yourself.
Editing the metadata of a dataset that has been already published with Qvain does not create a new PID for the dataset. However, if you change the files or folders linked to the dataset with Qvain, a new version of the dataset with a new PID is automatically created. A new PID is allocated when the data linked to the dataset changes to ensure data integrity. A PID is a promise, and it should always give the user a possibility to access or at least find information about a specific dataset. Always consider reproducibility.
If you delete (or unfreeze) files linked to a published dataset in IDA, the published dataset is shown as deprecated in Etsin, because the files originally linked to the published dataset are no longer available. If you wish, you can create a new version of the dataset and link new files to it. The landing page of the new version and files linked to it will be shown in Etsin normally.
2. IDA – research data storage
2.1. General questions about IDA service
IDA storage quota can be applied for by researchers from Finnish higher education institutes, Finnish research institutes and research funded by the Academy of Finland (Academy Projects, Centres of Excellence, research programmes, research infrastructures). Use of the service requires the registration of a CSC customer account and an IDA storage space application. IDA storage space is applied online in CSC’s Customer Portal (SUI). Applications are reviewed by your home organisation’s IDA contact person. After the IDA access is granted, logging in to the IDA service is possible with either Haka or CSC account. Read more about the IDA use policy and applying IDA storage quota.
The Project Manager can apply for IDA storage space in CSC Customer Portal (SUI).
- Create a CSC customer account.
- Create a CSC project
- Apply for IDA storage space with an application form in CSC Customer Portal (SUI).
All this can be done online and the storage quota is available after your application has been approved by your organisation’s IDA contact person. Other project members can be added by the Project Manager after they have registered their CSC customer account. Detailed instructions can be found at https://www.fairdata.fi/en/ida/becoming-an-ida-user/.
Project is a group of users that have a right to use a service and who belong to a shared storage space in the IDA service. The project must have a named person – a Project Manager – responsible for the group access, their data and the service usage (the technical term is CSC Project Manager, read more about the prerequisites and responsibilities). IDA storage quota is applied by the Project Manager and granted to their project group via CSC Customer Portal (SUI). The same project group can be used in other services produced by CSC. Read more about CSC projects.
The IDA storage quota is not tied to a a single real-life research project or a specific research grant, except when applied from the quota of Academy of Finland on condition that the research is funded by them. In the Fairdata services one project in IDA can produce and publish multiple datasets in Etsin, but the files linked to a specific published dataset can belong only to one project group.
The Project Manager has a certain number of responsibilities; he/she can e.g. add and remove project members. The Project Manager also serves as a contact person between the project and CSC. For these reasons the Project Manager also must have an active CSC customer account at all times. When creating a CSC project it is specifically asked in the application if the person applying can act as the Project Manager, i.e. he/she is an experienced researcher (e.g. post doc, team or group leader, professor) involved in the project. If you have any questions regarding this matter, please contact servicedesk (at) csc.fi
The Project Manager can add and remove members in the CSC Customer Portal (SUI). The instructions to add and remove project members can be found at https://www.fairdata.fi/en/ida/becoming-an-ida-user/#how-to-apply.
If your home organisation changes, please contact servicedesk (a) csc.fi and we’ll update your customer account or advice in creating a new one.
If the Project Manager’s home organisation changes:
The project using IDA needs to have a Project Manager, who is affiliated to an organisation that is entitled to use IDA. If the Project Manager’s home organisation changes, the options are to name a new Project Manager to CSC, or to think whether the IDA space should be also moved under another organisation in IDA. In either case, please contact servicedesk (a) csc.fi.
If a user’s home organisation changes:
The Project Manager defines who are entitled to be project members. All IDA users need a CSC account, but having an account and registration is also possible without Haka ID. The use of the project’s IDA storage space is not tied to a single organisation, so the Project Manager can also have people from other organisations as project members. If a user’s organisation changes they should contact servicedesk (a) csc.fi to update the customer account or to create a new one.
A foreign research associate working in the Finnish research system can be given access to IDA as a project member, if the project is working under a Finnish higher education institute, state research institute or another entity entitled to use IDA. The manager of the IDA project can decide which users can access the project’s IDA space. Read more about CSC account policy for foreign research associates.
IDA is meant for research data. A student can be a member of a project using IDA, when the student produces research data or when access to research data is needed. The project using IDA needs to have a Project Manager who is an experienced researcher (e.g. post doc, team or group leader, professor) involved in the project or e.g. a project manager or a technical coordinator for a collaboration project. IDA is not meant for students, who produce and save data only for a thesis.
The new Project Manager must have a CSC account and they must be a member of the project.
The new and the old Project Manager should then together contact the CSC Service Desk (servicedesk (a) csc.fi) and ask for the Project Manager to be changed.
The owner of the data defines its access rights and using IDA does not change the ownership of the data. Although the IDA storage space is granted by an organisation, agreements on the ownership of the data should be made separately and quota granter organisation doesn’t automatically have rights to the stored data. It is recommended to make agreement on the rights to the data early on, e.g. what happens when someone leaves the project.
2.2 Questions about using IDA and the data stored in IDA
In IDA it is possible to store all kinds of research data: new research data, as well as published research data. IDA is meant for storing stable research data, which can be constructed and described as research datasets. The service is not optimized for data under heavy usage (e.g. computing disk data, data used in web applications). It is either not ideal to connect IDA directly to an instrument that constantly pushes data onto the disk. What is relevant is how IDA is used, not what kind of research data is being stored. However, IDA is not suitable for storing sensitive personal data or biometric identifiers.
IDA is not suitable for storing sensitive data. After data is correctly anonymized it is suitable also for IDA. Read more on personal data and its anonymisation in Finnish Social Science Data Archive’s data management guidelines: http://www.fsd.uta.fi/aineistonhallinta/en/anonymisation-and-identifiers.html. Please note that there are special requirements for handling personal data in research.
The most suitable user interface for you depends on your skills, needs and the computers you are using. The browser user interface at https://ida.fairdata.fi/ is easy to use and it doesn’t require installing new software.
The command line tools can be used with Linux/Mac operating system. Notice, that freezing files is not possible with command line tools: https://www.fairdata.fi/en/ida/user-guide/#command-line-tools
The full user guide is available at https://www.fairdata.fi/en/ida/user-guide/.
With the coming metadata tool Qvain, data stored in IDA can be described as a dataset, set as openly downloadable and published in Etsin. Your published dataset will get a permanent identifier and a landing page, enabling citations. Multiple files and folders can be linked to the dataset, but the files linked to a specific dataset can belong only to one project group in IDA.
You can also set files/folders downloadable from IDA with temporary share links, see the guide: https://www.fairdata.fi/en/ida/user-guide/#temporary-share-links.
Additional storage quota can be applied by the IDA Project Manager by sending a request to servicedesk (a) csc.fi, from which we will forward the request to the responsible organisation.
This picture gives an example of how file size affects the speed of the transfer.
When transferring a lot of small files to IDA the transfer might be faster (less overhead) if you zip the files into a single file. Notice that the zip file will not be extracted in IDA.
The IDA quota usage metrics is not yet available in CSC Customer Portal (SUI). We’re aiming to show the IDA projects their the total quota and quota usage in the Customer Portal’s project details in the future. If you belong to just one project, the used quota is available in the browser UI in the root view, under the frozen and staging folder.
The IDA CLI tools are available to you on Taito, and can be used for uploading, modifying, and downloading content in the staging area as well as downloading content in the frozen area of your project.
Example: To download a particular file from the staging area of project 2001234 with the relative pathname ‘/somefolder/somefile.txt’ to the local filename ‘file_on_taito.txt’ in the current directory on Taito, you would use the command:
ida download -p 2001234 /somefolder/somefile.txt file_on_taito.txt
The IDA CLI tool will prompt you for your IDA credentials (if you have not already defined them e.g. in your .netrc file).
Example: To download a particular file from the frozen area of project 2001234 with the relative pathname ‘/somefolder/somefile.txt’ to the local filename ‘file_on_taito.txt’ in the current directory on Taito, you would use the command, including the parameter ‘-f’ to indicate that the relative pathname corresponds to the frozen area of the project:
ida download -p 2001234 -f /somefolder/somefile.txt file_on_taito.txt
See the online IDA CLI guide for more details about what you can do with the CLI tools and for additional examples of the most common operations.
IDA is a specialized service for the secure storage of research data, and not just a simple cloud storage solution. The purpose of the dedicated CLI tools is to simplify the transfer of data to and from the service for the user, while still ensuring that essential validation and other housekeeping tasks are properly performed in relation to upload and management of data.
Various checks and other background operations occur when uploading and managing data in the service, whether using the CLI tools or the web UI, which would not happen if the service behaved merely as a cloud storage solution mounted and interacted with as a basic filesystem.
3. Etsin – Reseach Data Finder
Etsin is primarily developed for researchers but the service is open for everyone at etsin.fairdata.fi. Anyone can search for datasets and the published metadata on the dataset is open for everyone to see. The data owner decides how the underlying research data can be accessed and by whom.
Etsin shows information about the datasets and metadata in the Finnish national Fairdata services. Datasets can be created via Qvain Service or with Metax End User API. In order to use those services you need to be registered as a CSC customer (registration can be done in CSC Customer Portal (SUI)).
Etsin also harvests information from different sources, and new sources are added as a part of continuous Service Development.
All data Etsin uses is stored in Metax, Metadata Warehouse Service.
Etsin harvests information from different sources. Harvesting means that the metadata (dataset) is originally stored somewhere else (external repository) and the master landing page is not in Etsin but in an external service. By following pre-agreed mappings and set of harvesting rules the information about datasets is fetched into the metadata repository Etsin uses (Metax, Metadata Warehouse Service) and are thus findable via Etsin. A link from Etsin will then take the user from Etsin to the master landing page outside of Etsin.
To have the organisation’s datasets harvested the following issues should be discussed and agreed upon (to be done as a project together with the Customer):
- Customer’s right to deliver the metadata (license, personal data) and responsibility regarding the data quality
- Harvesting protocol (preferably OAI-PMH) and APIs (source organisation should have an API that communicates with CSC’s API)
- Harvested datasets should have Persistent Identifiers in the source repository
- Mapping and refinement of metadata
Please contact the CSC Service Desk (servicedesk (a) csc.fi ) if you are interested or would like to have more information on having metadata harvested into Etsin.
4. Qvain – Research Dataset Metadata Tool
4.1. General questions about Qvain service
Unfortunately there have been delays in the development of Qvain. Pilot group started using Qvain in the mid May 2019 and after the piloting Qvain will be opened on 1st of July 2019.
Qvain and Qvain Light will both be available from 1st July 2019. Both can be used to describe and publish datasets but Qvain Light uses a limited format and is easier to use.
- Qvain: Complex multi-tab form which offers the user ALL fields that the Reasearch Dataset Data model itself offers. Needs a bit more effort than Qvain Light but offers more ways to describe a dataset.
- Qvain Light: Simple 1-tab form which offers all the basic and most important fields to describe a dataset. Although there are a lot less fields than in Qvain, the datasets created by Qvain Light are as official and fulfill the requirements for a quality dataset.
No matter which version of Qvain was used, all datasets are visible in Etsin and it’s not visible to Etsin-users which tool was used for describing. In addition, you can always use which ever version of Qvain to edit your dataset (no matter with which version you created the first version).
Tip! Try Qvain Light and if you find something missing continue with Qvain. (Remember! Adding or removing files to/from a dataset creates a new version of it. So be careful with that. Other metadata can be freely edited.)
Qvain provides an easy way to describe your research data and publish the data stored in IDA. By creating quality metadata to your dataset and publishing it, you make it findable in Etsin and enable reuse of the data. Quality metadata keeps the data findable and citable and ensures that Fairdata services are interoperable. Qvain metadata tool helps to add metadata that fulfils the minimum requirements for digital preservation.
The files in the Fairdata IDA can be linked to the dataset description, but it is also possible to refer to data sources outside of Fairdata. The user must be a member of an IDA project to link IDA files to the dataset description.
In order to use Qvain you need to be registered as a CSC customer. After registration logging in to Qvain is possible with either Haka ID or CSC account.
If your home organisation is a member of the Haka federation, you can register as a customer in CSC Customer Portal (SUI) with your own Haka ID. If you don’t have a Haka ID provided by your home organisation, you can apply for a CSC account (e.g. non-university and international customers).
4.2 Questions about using Qvain and the metadata saved in Qvain
CSC will perform a migration operation to transfer all published datasets from old Etsin (etsin.avointiede.fi) into new Etsin (etsin.fairdata.fi). These already existing datasets will then be updatable also in Qvain. Migration will take place just before Qvain is opened.
Note! If you have unpublished datasets in old Etsin they will NOT be transferred into new Etsin. If you want a dataset to be transferred into new Etsin (and to be updatable in Qvain), you must publish the dataset before the migration. CSC will contact dataset owners in old Etsin when the time of the migration has been set (well before the actual migration).
Updating the datasets in old Etsin will not be possible once the migration starts.
The Fairdata services offer the possibility to publish data so that they are findable, accessible, interoperable and reusable. However, this requires some input of the curator and the researchers. Before publishing, it is good to consider the following things:
- When publishing data in the Fairdata services, the data needs to be licensed. Licensing is always recommended. The license is not the same thing as access, instead it states how the data can be used. Check your funder’s and research organization’s policies and other relevant documents. Ideally, all this should be documented in the data management plan and in contracts. Generally, the recommended license for research data is CC-BY 4.0. All metadata is open data, i.e. CC0. If you are uncertain, discuss with your data support or library.
- Find out about and/or agree upon which agents are documented as creator, publisher, curator. There can be only one publisher. It should preferably be an institution. The curator is the party that is responsible for taking care of the metadata and data, and that can give more information about the data and answer questions about access and licensing. This information should be kept up to date at all times!
- If you do not have an ORCID, register one. Always use it to ensure credits.
- Other mandatory information and things to consider are access and important elements that support findability and reuse, i.e.
- the title of the dataset should be unique and descriptive
- the description should be extensive enough to enable reuse
- the keywords are important for findability. Use relevant domain specific terminology.
- several fields of science can be added
- Arrange your files and folders carefully. Use unique file and folder names. If the data is in IDA, freeze it. Double check you don’t include data that cannot be published. All personal information should be excluded from file names etc.
It’s not mandatory. Qvain can be used independently of the file storage service. Depending of the user’s needs, there are tree options:
- Using Qvain, it’s possible to create and publish only the metadata about a dataset without any linked files. If the first version of the published dataset has no files linked to it, then the user can also add files later to the published dataset, without changing the dataset’s persistent identifier.
- If the user is also using the IDA service, selected files stored and frozen in IDA can be easily described as a dataset with Qvain, set as openly downloadable and published in Etsin. Etsin shows the files in their folder structure, which is useful when the dataset consists of multiple files.
- Third option is to use Qvain to create and publish a dataset which links to a remote web resource. The user can also define a download URL for the remote resource in Qvain.
An error like that is most likely caused by missing mandatory data in you dataset. Make sure that you have filled in all mandatory fields, marked with red asterix. By clicking the link ‘Show details’ in the error message pop-up you might get some idea what is missing (the details are very technical but there might be a hint of a field name if you look carefully). If the problem persists please don’t hesitate to contact CSC Service Desk (servicedesk (a) csc.fi) and if possible provide the details from the link to help us figure out the cause of the error.
Qvain shows the datasets which you have created yourself. At the moment it’s not possible to add edit permissions to other users. If you have created a dataset in old Etsin or via Metax API it should be visible in Qvain. There have been cases where the syncing of these external datasets has not been successful. In such case you can ask Qvain to do the sync: just visit a page https://qvain-stable.csc.fi/api/datasets/?fetchall (you need to be logged in to Qvain!). The page will show you the datasets it has synced and they should now be visible in Qvain.
Note! Be careful when using the ‘fetchall’ link. If you have published datasets with unpublished edits, the edits will be overwritten by the published version of the dataset!
When you save a dataset in Qvain it does not affect the published dataset. Saving the dataset only saves it as a draft. Only when you Publish the dataset are the changes visible in Etsin.
Be careful when changing the already published dataset! After the metadata has been published all changes in the data should be clearly documented in the metadata or a new version of the dataset must be created and stored. The owner of the dataset is responsible for not compromising the repeatability or reproducibility of the research. Note! If you add/remove files to/from the published dataset a new version will automatically be created (Etsin will show the newest version by default and you will see other versions under that). In Qvain’s My datasets view all versions are shown as individual datasets.
If you add/remove files to/from the published dataset a new version will automatically be created (Etsin will show the newest version by default and you will see other versions under that). In Qvain’s My datasets view all versions are shown as an individual dataset. Note! After the metadata has been published all changes in the data should be clearly documented in the metadata or a new version of the dataset must be created and stored. The owner of the dataset is responsible for not compromising the repeatability or reproducibility of the research.
This error most likely means that you haven’t been logged in properly. Either your session has ended or something went wrong when you logged in. Click ‘Login’ at the top right corner or if you cannot see the Login -link, try to Log out by choosing ‘Sign out’ under the User drop down at the top right corner and then ‘Login now!’.
5. Metax – Metadata Warehouse
Metax is mostly invisible to the end users. It is the metadata storage for the Fairdata Services. The application profile is based on DCAT and it’s compatible with Datacite.
Metax does not have a graphical UI. Most of the API’s provided by Metax End User API are accessible only to other Fairdata services, and the main method for interacting with Metax should be by using those services. For advanced end users, Metax API can be accessed directly by using special tokens for authentication. See End User Access for more information.
6. Digital Preservation Service for Research Data
Digital preservation refers to the reliable preservation of digital information for several decades or even centuries. Hardware, software, and file formats will become outdated, while the information must be preserved. Reliable digital preservation requires active monitoring of information integrity and anticipation of various risks. Metadata, which describes for example the information content, provenance information and how the content can be used, has a key role in this.
Research datasets (data, publications, code, learning materials etc.) are eligible for digital preservation, if they are considered fundamental to national or the institutional research activities. Both are based on the organisation’s evaluation.
The Common Digital Preservation Services, including Digital Preservation Service for Research Data, are offered and owned by the Ministry of Education and Culture, and are managed and further developed by CSC – IT Center for Science Ltd.
Many trusted digital repositories provide long-term storage of the data exactly as it has been deposited, and guarantee data integrity on the bit level. However, such archiving method will not ensure that the data will be usable and readable in the long term, due to software and file format obsolescence. The aim of digital preservation is to guarantee that especially valuable datasets are still usable for the future generations. Digital preservation is costly: it requires active, ongoing curation and taking measures to extend the data lifetime so that it stays usable for decades or even centuries. Such measures can be for example the conversion of obsolete file formats, or various means to ensure data integrity and quality, keeping data readable and usable and protecting it from decay and damage in the long term.
Partner organization refers to an organization, department, or other entity using the Digital Preservation Service for preservation of digital content. For more information, see https://www.fairdata.fi/hyodyntaminen/kayttoonotto/.
Datasets transferred to digital preservation are those, which have value for the organisation or on national level in the long term. Organisations evaluate their needs for digital preservation themselves. For example the following characteristics could be considered:
- The dataset’s potential for reuse
- Unique data whose generation or collection required a lot of resources; irreplaceable research that would be extremely difficult, costly, or impossible to reproduce
- Data resources that are considered fundamental to national or the institutional research activities.
Digital preservation services define the policies and technical specifications necessary for data preservation in collaboration with organizations. The information includes:
- Sufficient metadata that make the data comprehensible
- File formats that are open and readable by more than one application and facilitate reuse of data
- Clarification of ethical and legal issues such as IPR or personal (sensitive) data issues
Partner organization proposes the dataset to be preserved to the Ministry of Education and Culture, which grants a certain quota for the dataset.
There are several options for doing this. For more information, see http://digitalpreservation.fi/ingest-options.
For more information, see http://digitalpreservation.fi/.
If you have any questions about particular features, or any other issue relating to the services, please contact CSC customer support at servicedesk (a) csc.fi