The UKCCSRC Data and Information Archive has compiled the following guidance to answer frequently asked questions on research data management. Please also check with your home institution to ensure that you are managing your research data in line with their requirements.
What is the definition of data?
For UKCCSRC data is defined as data that is acquired, collected, observed, or created, for the purpose of analysis and interpretation, to produce original research output or validate research findings. Data includes digital research information and materials as well as data generated through modelling and other assimilations.
UKCCSRC data has either been created or maintained by public funding and as such any output from the projects can be considered to be part of the Public Record. This means that supporting information such as reports, papers, images, recordings, posters and educational materials may be considered as part of the research data.
What data should you archive?
You should include data which:
- Underpins a publication (access to the data should be described in the publication)
- Is necessary to validate research findings
- Is worth keeping
- Has potential for re-use (including currently unforeseen uses)
- Is unique and cannot be re-generated (e.g. environmental observations)
What type of outputs should you archive?
Examples of CCS project outputs which can be archived include the following :
- Data files
- Educational materials
i.e. any useful information, not solely data files.
What is research data?
Research data comes in wide variety of formats and may include any of the following:
- Documents (text, MS Word), and spreadsheets
- Laboratory notebooks, field notebooks, diaries
- Online questionnaires, transcripts, surveys
- Audio, video, photos
- Physical specimens, samples
- Data files
- Database contents
- Models, algorithms, scripts
- Contents of an application (input, output, log files for analysis software, simulation software, schemas)
- Methodologies, workflows
- Standard operating procedures, protocols
For more information see An Introduction to Managing Research Data.
Should all data be included?
In general aim to archive data from your project as fully as possible, and consider these important points.
- The focus of data archiving is on the data that was collected or produced and used to come to your results, to make that data available for reuse and to enable validation of results.
- For validation, sufficient data needs to be provided so that other researchers can look at the data you have used, and see that the results you have drawn appear correct.
- For experimental work, often measured output data would be provided along with various input data and the parameters for the experimental set up, such that the experiment and the results can be understood and re-used by others.
- Model data/software should also be made available, if necessary to validate your research findings.
- If your source code is necessary to validate your research findings, then you are expected to share it.
- Data re-use may be in applications unforeseen by the initial investigators (including uses in other disciplines), and data might be used in new ways, and in combination with other datasets. As future uses can’t always easily be predicted, re-use is best facilitated by archiving the data arising from projects as fully as possible.
- Re-use of the data will result in credit to the researchers who generated the data.
- Tabulated values must be provided rather than (or in addition to) graphed results, to enable re-use. Columns should be appropriately labelled.
- If your data has been included in a published paper, we can include a link to this in the UKCCSRC Data and Information Archive. Please also consider whether other data is available from your project and should be archived, and whether archiving the data separately from the paper will increase its discoverability and potential for re-use.
For more information on choosing what data to include, see the Digital Curation Centre guide, Five steps to decide what data to keep.
How should data be organised?
- Data submitted should be quality checked by the provider and of a high standard.
- Data should be well presented and clearly labelled.
- Helpful, consistent file naming conventions should be used.
What is metadata?
Metadata is information about the data. Sufficient metadata must be provided to enable discovery, and understanding of the data and how it was generated. The data must intelligible, assessable and usable by others. For data to meet these requirements it must be supported by explanatory metadata.
What metadata should be provided?
- Discovery metadata provides information that enables a user to find out if a resource exists, its location, ownership and if it meets their requirements.
- One Metadata Form should be completed to describe each dataset (files can be grouped together in datasets as appropriate).
- Extra documentation should be provided with the data if necessary to explain the contents in more detail, such as a report, readme file or an additional information sheet in a spreadsheet.
Is an embargo period allowed?
Yes, data in the UKCCSRC archive can be held in embargo for a specified period (usually between 6 to 24 months) before it is released publically, e.g. until publication of a peer-reviewed paper. During this time the data will not be released without your agreement.
What about restricted/confidential data?
The default position is that data should be open without restriction in a way that does not damage the research process, but there may be cases where this is not possible.
- If there are any restrictions these must be valid and reasonable.
- Preparation of a second paper is not a valid reason to withhold data.
- Metadata should still be provided in all cases including a statement explaining any restrictions
- Examples of data that could be restricted –
- Personal information
- Confidential information
- Industrial project partners
- Commercial interest
- Ethical considerations
- Consider any confidential or 3rd party IPR issues and agree data release procedures from the outset of the project. Consider a collaboration agreement with external partners.
- The aim in these situations is to manage access rather than block entirely – this can be done in various ways, including confidentiality agreements.
What is a Digital Object Identifier (DOI)?
- A DOI is a persistent identifier for your data.
- A DOI enables datasets to be cited in the same manner as a scientific journal article.
- A DOI recognises the value of the data and the effort that has gone into its creation.
- This enables you to get credit as the creator of the dataset.
- The DOI should be included in the related publication.
Do all dataset get issued with a DOI?
- No, not all submissions are suitable for a DOI (e.g. presentations)
- The data must be high quality, stable (not going to be modified), complete (not going to be updated) and permanent.
- Let us know if you require a DOI for your data.