Configuration

  1. Create a new directory for your data.

    mkdir -p data/custom
    
  2. Copy over your unaligned sequences.fasta and metadata.tsv to data/custom.

    • Note: GISAID sequences and metadata can be downloaded using the “Input for the Augur pipeline” option on https://gisaid.org/.

    • metadata.tsv MUST have at minimum the columns strain, date, country.
      If collection dates or country are unknown, these fields can be left empty or filled with “NA”.

    • The first column MUST be strain.

  3. Create a profile for your custom build.

    scripts/create_profile.sh --data data/custom
    
    2022-06-17 09:15:06     Searching for metadata (data/custom/metadata.tsv)
    2022-06-17 09:15:06     SUCCESS: metadata found
    2022-06-17 09:15:06     Checking for 3 required metadata columns (strain date country)
    2022-06-17 09:15:06     SUCCESS: 3 columns found.
    2022-06-17 09:15:06     Searching for sequences (data/custom/sequences.fasta)
    2022-06-17 09:15:06     SUCCESS: Sequences found
    2022-06-17 09:15:06     Checking that the metadata strains match the sequence names
    2022-06-17 09:15:06     SUCCESS: Strain column matches sequence names
    2022-06-17 09:15:06     Creating new profile directory (my_profiles/custom)
    2022-06-17 09:15:06     Creating build file (my_profiles/custom/builds.yaml)
    2022-06-17 09:15:06     Adding default input data (defaults/inputs.yaml)
    2022-06-17 09:15:06     Adding custom input data (data/custom)
    2022-06-17 09:15:06     Adding `custom` as a build
    2022-06-17 09:15:06     Creating system configuration (my_profiles/custom/config.yaml)
    2022-06-17 09:15:06     Adding default system resources
    2022-06-17 09:15:06     Done! The custom profile is ready to be run with:
    
                            snakemake --profile my_profiles/custom
    
    • Note: you can add the param --controls to add the controls build that will run in parallel.

  4. Edit my_profiles/custom/config.yaml, so that the jobs and default-resources match your system.

    Note: For HPC environments, see the High Performance Computing section.

    #------------------------------------------------------------------------------#
    # System config
    #------------------------------------------------------------------------------#
    
    # Maximum number of jobs to run simultaneously
    jobs : 1
    
    # Default resources for a SINGLE JOB
    default-resources:
    - cpus=1
    - mem_mb=4000
    - time_min=60
    
  5. Do a “dry run” to confirm setup.

    snakemake --profile my_profiles/custom --dry-run
    
  6. Run your custom profile.

    snakemake --profile my_profiles/custom
    

Important: If you are doing routine production analyses, it is recommend to first delete all previous output before running your profile. This will force ncov-recombinant to download fresh copies of the pango-designation issues (resources/issues.tsv) and the lineage phylogeny (resources/tree.nwk).

snakemake --profile my_profiles/custom --delete-all-output
snakemake --profile my_profiles/custom