Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




  • BioScripts
  • Neel
  • Bash script to alignment of short reads against reference genome !

Bash script to alignment of short reads against reference genome !

  • Public
By Neel 1794 days ago
bwa mem -t 40 -R '@RG\tID:K12\tSM:K12' \ E.coli_K12_MG1655.fa SRR1770413_1.fastq.gz SRR1770413_2.fastq.gz \ | samtools view -b - >SRR1770413.raw.bam sambamba sort SRR1770413.raw.bam sambamba markdup SRR1770413.raw.sorted.bam SRR1770413.bam ##Breaking it down by line: #alignment with bwa: bwa mem -t $threads -R '@RG\tID:K12\tSM:K12' --- this says "align using so many threads" and also "give the reads the read group K12 and the sample name K12" #reference and FASTQs E.coli_K12_MG1655.fa SRR1770413_1.fastq.gz SRR1770413_2.fastq.gz --- this just specifies the base reference file name (bwa finds the indexes using this) and the input alignment files. The first file should contain the first mate, the second file the second mate. #conversion to BAM: samtools view -b - --- this reads SAM from stdin (the - specifier in place of the file name indicates this) and converts to BAM. #sorting the BAM file: sambamba sort SRR1770413.raw.bam --- sort the BAM file, writing it to .sorted.bam. #marking PCR duplicates: sambamba markdup SRR1770413.raw.sorted.bam SRR1770413.bam --- this marks reads which appear to be redundant PCR duplicates based on their read mapping position. It uses the same criteria for marking duplicates as picard. minimap2 -ax sr -t 40 -R '@RG\tID:O104_H4\tSM:O104_H4' \ E.coli_K12_MG1655.fa SRR341549_1.fastq.gz SRR341549_2.fastq.gz \ | samtools view -b - >SRR341549.raw.minimap2.bam sambamba sort SRR341549.raw.minimap2.bam sambamba markdup SRR341549.raw.sorted.minimap2.bam SRR341549.minimap2.bam #The only major change from bwa mem is that we'll tell it we're working with short read data using -ax sr: