Job Progress
EDGE COVID-19 is a tailored bioinformatics platform based on the more flexible and fully open-source EDGE Bioinformatics software (Li et al. 2017). This mini-version consists of a user-friendly GUI that drives standardized workflows for genome reference-based 'assembly' and preliminary analysis of Illumina or Nanopore data for SARS-CoV-2 genome sequencing projects. The result is a final SARS-CoV-2 genome ready for submission to GISAID and/or GenBank.
The default workflow in EDGE COVID-19 includes:The EDGE COVID-19 platform can accommodate Illumina or ONT data, including ONT data from the SARS-CoV-2 ARTIC network sequencing protocols. Users can input/upload Illumina or Nanopore sequencing FASTQ files (and/or download from NCBI SRA). For Illumina data, default analyses include only read QC, read mapping to the reference, and SNP/variant analysis. For ONT data, the data must be demultiplexed prior to uploading; the samples will be processed individually. The SNP/variant calling is not on by default for ONT. However, other functions (e.g. de novo assembly for whole genome data) are also available for both sequencing platforms. While command line execution is possible (see here and here), the GUI provides an easy data submission and results viewing platform, with the graphical and tabular views of variant/SNP data and a genome browser to view read coverage and location of SNPs or variants, as well as the reference annotations.
This light-weight version is a Docker container, able to run on any local hardware infrastructure or in the cloud. We have tested this Docker container on laptops and cloud, using several Illumina (e.g. SRR11177792) and ONT (e.g. SRR11300652) datasets.
Note: For EDGE Bioinformatics users who would also like to use the phylogeny or read- and assembly-based taxonomy classification tools to identify all organisms that may be present within complex samples, we recommend using the original EDGE Bioinformatics platform which harbors several tools and associated (large) databases that enable such a search. In initial tests of taxonomy classification of SARS-CoV-2 samples (with no SARS-CoV-2 genomes in any of the databases), we recover SARS coronavirus and Bat Coronavirus as the nearest neighbor.
EDGE COVID-19 Docker image: location and instructions.
The Source code: LANL-Bioinformatics GitHub site.
Chien-Chi Lo, Migun Shakya, Ryan Connor, Karen Davenport, Mark Flynn, Adán Myers y Gutiérrez, Bin Hu, Po-E Li, Elais Player Jackson, Yan Xu, Patrick S G Chain, EDGE COVID-19: A Web Platform to generate submission-ready genomes from SARS-CoV-2 sequencing efforts, Bioinformatics, 2022;, btac176, https://doi.org/10.1093/bioinformatics/btac176
This research was supported by LANL (20200732ER), by DTRA (CB10152 and CB10623) and by the DOE Office of Science (KP160101), through the National Virtual Biotechnology Laboratory, a consortium of DOE national laboratories focused on response to COVID-19, with funding provided by the Coronavirus CARES Act.