Sharing data: Protein Data

Make your COVID-19 data accessible to the scientific community by releasing them in public databases along with respective metadata.

Metadata

Metadata are “data about data”, I.E. the information a dataset must be provided with in order to correctly interpret, manage and store it over time. Metadata generally include information on the methodology used to collect the data, on instrumental procedures, definitions of variables, units of measurement, indications on file formats, software used to collect and/or process the data and more. Metadata can be collected in simple text files and archived together with the dataset.

Researchers are strongly encouraged to use standard metadata, where they exist. It is strongly recommended to start defining and collecting metadata from the very beginning of the research project.

In proteomics it is suggested to use the Minimum Information About a Proteomics Experiment (MIAPE) standard, using the controlled vocabulary defined by the Proteomics Standards Initiative: PSI CVs.

More information on standards and formats for metadata is collected in the FAIRsharing.org resource.

Repositories

To identify the most suitable resource for sharing your data, we suggest using tools such as FAIRsharing, using the ‘proteomics’ keyword.

Protein-protein interaction data

For protein-protein interactions, at binary level or at network level, it is highly recommended to use the MINT database. Guidelines for data submission can be found on the dedicated page.

Mass spectrometry

For mass spectrometry experiments it is strongly advised to use the PRIDE repository provided by the ProteomeXchange Consortium. Data can be submitted using the PX Submission Tool.

Protein structures

This class includes proteins structural biology data of and structural data about other biological macromolecules.

For X-ray and NMR crystallographic data it is suggested to use the Protein Data Bank in Europe (PDBE) database.

For electron microscopy and tomography data the Electron Microscopy Public Image Archive (EMPIAR) database is recommended.

Molecular biology sensible data

For molecular biology data generated from human samples that can potentially be used to identify a specific subject (and must therefore be protected by controlled access) it is recommended to use the European Genome-phenome Archive (EGA).

Every institution is likely to provide local repositories for this type of data. You are kindly invited to contact your Data Management or IT service for support.

For molecular biology data of SARS-CoV-2 data generated in combination with host data (ES: combined sequencing studies of host transcriptome and viral genotype), storage in a local repository is recommended together with the registration of the datasets at the BioSamples database.

This section is manually curated and may not be complete. Contact us to report errors, inaccuracies or missing resources.

Local Repositories

Some local repositories are available at department, university, institute level.

We suggest contacting your Data Stewardship or IT service for more information and support.

Suggest us a Local Repository by filling in this form. Increase your visibility, create synergies with other laboratories and let the impact of your research grow.