Controls
Genbank
After completing the tutorial, a good next step is to run the
controlsbuild.This build analyzes publicly available sequences in
data/controls, which include recombinant (“positive”) and non-recombinant (“negative”) sequences.Instructions for how to include the
controlsin your custom build are in the configuration Configuration section.
Run the workflow.
snakemake --profile profiles/controls
GISAID
For GISAID users, a comprehensive strain list is provided that includes all designated recombinants to date (
XA-XBE). This dataset includes 600+ sequences, and can be used for in-depth validation and testing.It is recommended to use the “Input for the Augur pipeline” option, to download a
tarcompressed archive of metadata and sequences todata/controls-gisaid/.
Prep the input metadata and sequences.
cd data/controls-gisaid tar -xvf gisaid_auspice_input_hcov-19_*.tar mv *sequences.fasta sequences.fasta # Retain minimal metadata columns, to avoid non-ascii characters csvtk cut -t -l -f 'strain,date,country,gisaid_epi_isl,pangolin_lineage' *.metadata.tsv > metadata.tsv cd ../..
Run the workflow.
# Option 1: Local testing snakemake --profile profiles/controls-gisaid # Option 2: High Performance Computing with SLURM scripts/slurm.sh --profile profiles/controls-gisaid