Illumina based assembly pipeline steps !


  1. Merge re-sequenced FastQ files (cat)
  2. Read QC (FastQC)
  3. Adapter trimming (fastp)
  4. Removal of host reads (Kraken 2; optional)
  5. Variant calling
    1. Read alignment (Bowtie 2)
    2. Sort and index alignments (SAMtools)
    3. Primer sequence removal (iVar; amplicon data only)
    4. Duplicate read marking (picard; optional)
    5. Alignment-level QC (picard, SAMtools)
    6. Genome-wide and amplicon coverage QC plots (mosdepth)
    7. Choice of multiple variant calling and consensus sequence generation routes (iVar variants and consensus; default for amplicon data || BCFTools, BEDTools; default for metagenomics data)
      • Variant annotation (SnpEff, SnpSift)
      • Consensus assessment report (QUAST)
      • Lineage analysis (Pangolin)
      • Clade assignment, mutation calling and sequence quality checks (Nextclade)
      • Individual variant screenshots with annotation tracks (ASCIIGenome)
    8. Intersect variants across callers (BCFTools)
  6. De novo assembly
    1. Primer trimming (Cutadapt; amplicon data only)
    2. Choice of multiple assembly tools (SPAdes || Unicycler || minia)
  7. Present QC and visualisation for raw read, alignment, assembly and variant calling results (MultiQC)