When journals ask to make data shareable:
Options for storing or sharing data during the active phase of a project differ from repositories designed for the long-term preservation and accessibility of research data. Hover over the cards below to see how they differ.
This is where you store and share data with collaborators or partners during the active phase of your project. They serve as temporary solutions and are not intended for permanent/long-term storage and sharing. Examples include McGill-licensed Microsoft SharePoint or Microsoft Teams.
Data repositories are used for the long-term archiving and access of research data. Once data is deposited/published, it will remain accessible even after the project concludes. Examples of research data repositories include the McGill Dataverse and the Federated Research Data Repository.
The McGill Libraries offer an institutional data repository, the McGill University Dataverse, for research data publishing and archiving. McGill faculty, students, and staff are welcome to deposit datasets in the McGill Dataverse repository. All data are stored securely on servers located in Canada. Data can be publicly accessible, available to specific individuals, or private/restricted.
The McGill Dataverse can be found by following this URL: https://borealisdata.ca/dataverse/mcgill
Here are the instructions for creating a draft and submitting the dataset for publication (datasets will be reviewed by the McGill Libraries RDM Specialist before publication):
A few notes:
Any data format can be deposited in the McGill Dataverse collection, but there is a limitation on the file size (each file must be 5GB or smaller).
Once a dataset is published, it cannot be unpublished - this action is irreversible.
The default license is CC-0 public domain, meaning you would give up all copyright. If you want a different license, for example CC-BY, make sure to change this when uploading the dataset draft.
Sensitive data cannot be published in the McGill Dataverse. If your dataset contains information collected from anonymized human participants, contact the RDM specialist (rdm.library@mcgill.ca) with the consent form. The RDM specialist reviews all consent forms prior to granting permissions for dataset deposit.
When you upload the dataset, it will be a draft. When you want to publish it, the dataset should be submitted for review and it will be published if it's not missing any information.
How to deposit:
Create a Dataverse account by logging in (go to log in page and select McGill University from the drop-down under "Your Institution" and it will log you in automatically via single-sign on or you will be prompted to log in via McGill single-sign on): https://borealisdata.ca/dataverse/mcgill
When you’re logged in, go to the main McGill Dataverse page and you should see an Add Data button (https://borealisdata.ca/dataverse/mcgill). You can create a draft by clicking on that button and filling out the information/uploading files.
Provide a descriptive title for the dataset and enough information in the description for other users to understand where the information comes from, how it was collected, etc.
For training on using Dataverse, please see this series of self-paced online modules: Dataverse 101: A Portage Training Module Series
A wide variety of additional data repositories and databases are available that archive research data from many subject areas. Coverage varies by discipline.
McGill researchers who wish to look for a domain-specific data repository are encouraged to start by using Re3data.org which provides a comprehensive listings of disciplinary and institutional repositories to host and share research data.
Other places to find lists of data repositories include:
The following list names a few, reputable general data repositories:
Open data/research is the practice of research in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods. Open research data is data that can be freely accessed, reused, remixed and redistributed, for academic research and teaching purposes and beyond. Ideally, open data have no restrictions on reuse or redistribution, and are appropriately licensed as such. Openly sharing data exposes it to inspection, forming the basis for research verification and reproducibility, and opens up a pathway to wider collaboration.
However, there are also special considerations - not all data can or should be open. For example, to maintain Indigenous Knowledge sovereignty and Indigenous Data sovereignty (see CARE principles below in this page), or to protect the identity of human subjects, limited restrictions of access may be implemented.
Read more about Open research: https://book.fosteropenscience.eu/ (CC-0)
Since the publication in 2016 of "FAIR Guiding Principles for scientific data management and stewardship" in Scientific Data, the best practice for managing data is to adhere to the FAIR principles. The FAIR principles are a framework for ensuring that data collected by researchers across all disciplines and fields meet specific standards to promote open science, reproducibility of research, and maximize the benefits of research to academia and society.
The following description of the FAIR principles is taken directly from https://www.go-fair.org/fair-principles/
Findability:
The first step in (re)using data is to find them. Metadata (the description of the data) and data should be easy to find for both humans and computers. This means assigned a persistent identifier (PID) to the data/dataset (usually in the form of a digital object identifer, or DOI). Identifiers consist of an internet link (e.g., a URL that resolves to a web page where the data are located). Identifiers will help others to properly cite your work when reusing your data.
Accessibility:
Once the user finds the required data, they need to know how can they be accessed, possibly including authentication and authorisation. This does not mean that data should be open, necessarily. There are many reasons to restrict access to data (e.g. the data contain personally identifiable information (PII), are proprietary/licensed as intellectual property (IP), or contain other sensitive information). Accessibility essentially means that it should be clear under what conditions access is allowed. The rule with accessibility can be distilled to: "As Open as Possible, as Closed as Necessary"
Interoperable:
Interoperability refers to the ease by which data can be integrated with other/new data. In practice, storing data in open formats makes it easier to later integrate new data. On the other hand, storing data in proprietary formats hinders this effort. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. This means that when possible, it's best practice to use standardized vocabularies/variable labels/terms.
Reusable:
The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. In practice, this involves creating a README file with details on how to clean, transform, or manage the data, if applicable. This also involves applying a license to let others know if the data are public domain or if copyright is retained to some degree or completely.
Persistent identifiers are:
Persistent identifiers allow for:
For a general overview on copyright issues related to data, please see this Guide to Licensing Open Data from the Open Knowledge Foundation.
The following are typical Creative Commons license templates that are applied to data:
Not sure what license to select? Creative Commons has a neat tool to help.
Data journals publish data articles, which are mini-publications about a dataset or database. Similar to the peer review process for the write-up of a journal article or study, the data would be peer reviewed (for an example of peer review guidelines for data articles, see the Earth Science System Data Journal guide). Data articles can be about data that underlie existing publications or they can be independent publications. Publishing data as their own research product allows you to cite the data easily in subsequent publications, link the data to publications, and potentially receive credit for the data itself in addition to any related studies.
A (slightly outdated but still accurate) list of data journals
Additional information on data journals:
McGill Libraries • Questions? Ask us!
Privacy notice