Skip to main content

Sharing data: Genomics & Transcriptomics

Sharing data on COVID-19

Genome sequencing is a critical tool for monitoring the evolution of a pathogen and tracing the routes of an outbreak. The Comparison of genomic sequences allows us to understand if and when “new variants” of the pathogen have evolved, to study their characteristics and trace their spread. This process, which is known as “genomic surveillance” can provide valuable insights into the development of vaccines and drugs. For this reason it is imperative that the new SARS-CoV-2 genome sequences produced in different countries should be made available to the scientific community as quickly and efficiently as possible.

Metadata

Metadata are “data about data”, I.E. the information a dataset must be provided with in order to correctly interpret, manage and store it over time. Metadata generally includes information on the methodology used to collect the data, information on instrumental procedures, definitions of variables, units of measurement, indications on file formats, software used to collect and/or process the data and more. Metadata can be collected in simple text files and archived together with the dataset.

Researchers are strongly encouraged to use standard metadata formats, where they exist. It is highly recommended to structure and collect your metadata according to the reference guidelines provided by the database in which these will be deposited (e.g. ENA). Please refer to the ERC000033 checklist (standard metadata for pathogenic viruses) for SARS-CoV-2.

The standard for metadata about transcriptomic experiments is called Minimal Information about a high throughput SEQuencing Experiment (MINSEQE). The adoption of the MINSEQE standard will facilitate the integration of the results obtained from experiments conducted in different ways, thus maximizing the possibility of reusing the data and reproducing the results.

Please refer to FAIRsharing.org for more information on standards and formats for metadata.

Data Deposition

Viral data

Raw viral genome sequencing data as well as assembled and annotated genomes should be submitted to ENA. A detailed documentation for submitting SARS-CoV-2 sequencing data to ENA is available at SARS-CoV-2 submission. Since human genetic data are considered highly sensitive personal data under the GDPR, ad-hoc procedures should be applied to verify the possible presence of human “contaminating” sequences and removing them before sending your data to ENA. A suitable workflow to perform this type of analysis is available at: https://workflowhub.eu/projects/25. The workflow is based on the tools incorporated in the COVID-Galaxy, and can be easily imported and executed there.

Human data

Human genetic data should be submitted to the EGA. Detailed instructions on how to submit your data to this resource are available at: https://www.ebi.ac.uk/ega/submission. For this type of data, it is likely that individual institutions will also provide the possibility of archiving in a local repository. It is advised that you should contact your Data Management or IT service for support.

Host/pathogen interaction

The guidelines described in the previous section apply also to studies where sequencing data were produced both for the pathogen and its host (ES: combined sequencing studies of the host’s transcriptome and viral genotype). In this case it is recommended to use the BioSamples database to keep track of the biological samples from which the data was produced.

Transcriptomics

For raw sequencing data, please refer to the instructions provided in the previous sections. Processed data (gene expression profiles) should instead be submitted to the database Expression Atlas


Local Repositories

Some local repositories are available at department, university, institute level.

We suggest contacting your Data Stewardship or IT service for more information and support.

Suggest us a Local Repository by filling in this form. Increase your visibility, create synergies with other laboratories and let the impact of your research grow.