See also: Sharing data on COVID-19 | Services for research on COVID-19
Available data on COVID-19
The most important international databases and resources for genomic and transcriptomic data are listed below. Many of these are maintained by the European Bioinformatics Institute (EMBL-EBI) and have been recognized as Core Data Resources by the European bioinformatics infrastructure ELIXIR.
ENA: “European Nucleotide Archive (ENA)” is one of the most complete DNA sequence databases worldwide and part of the INSDC (International Nucleotide Sequence Database Collaboration together with GenBank and DDBJ) initiative. The portal collects both primary and secondary sequencing data, such as genomic sequences and related annotations. The ENA operates an instance of the Sequence Read Archive (SRA), an archival repository of sequence reads and analyses which are intended for public release, and operates under the guidance of INSDC. To facilitate the submission of SARS-CoV-2 genomic sequences, ENA has set up a dedicated helpdesk. The GenBank and DDBJ databases are the equivalent of ENA for America and Asia respectively. Generally, the data submitted to any of the three INSDC databases also become available in the other two within a few days.
EGA: “European Genome-phenome Archive (EGA)” is the European reference database for human genetic data. Human genome sequences are recognized as a particularly sensitive form of personal data by the current privacy regulations in Europe (GDPR). As such special forms of protection are required. EGA implements a series of measures and precautions to ensure the safeguard of the data and that their disclosure should be motivated by valid scientific reasons. The dbGaP database is EGA’s equivalent in the USA. The important differences in privacy regulations between the EU and the US make the procedures for depositing and accessing data quite different between the two databases. Hence data deposited at EGA are not automatically transferred to dbGaP (and vice versa).
Ensembl COVID-19: Ensembl, the genomic browser developed and maintained by EBI, has released a special version of their genome browser to facilitate the study of the SARS-CoV-2 genome. Equivalent tools that allow you to navigate the genome of SARS-CoV-2 can be found within the popular UCSC and the NCBI genomic browser
COVID-19 Cell Atlas: The “Single Cell Expression Atlas” is the reference portal that collects the expression profiles of the genes of the different types of cells in our body. These data can be particularly useful for understanding the dynamics with which different cells respond to viral infection. For this reason, the Sanger Institute has developed a version of the single cell atlas specific for COVID-19. The portal contains expression profiles of both infected and uninfected cells.
Expression Atlas: is the database that makes gene expression data (both at mRNA and protein level) available to the scientific community. This database allows the study of gene expression profiles in different conditions, tissues, cell types and pathological conditions. The tool allows rapid aggregation of the data facilitating the execution of comparative analyzes. The GEO (Gene Expression Omnibus) is the corresponding US database.
Covid-Galaxy: the analysis of Next Gen sequencing data requires the application of various bioinformatics tools. To facilitate the execution of this type of analysis, ELIXIR and the Galaxy Project have developed a dedicated public instance of the popular Galaxy workflow manager, complete with many tools and workflows that allow the analysis of SARS-CoV-2 sequencing data or to perform analysis chemoinformatics for the identification of molecules useful against the virus.
Resources at the European COVID-19 Data Portal
Resources developed in Italy
- ViruSurf: is a dedicated database, developed by the research group of Prof. Stefano Ceri, of the Politecnico di Milano. The tool aggregates currently available genomic sequences and provides information on the main genetic variants and possible functional effects. Alignments of complete and/or partial genomics sequences and files with related annotations can be easily obtained and downloaded.
Datasets produced by research groups in Italy
Complete genomes and raw sequencing data for SARS-CoV-2 and other CoVs