BOL: Bash script to alignment of short reads against reference genome !

BioScripts
Neel
Bash script to alignment of short reads against reference genome !

Bash script to alignment of short reads against reference genome !

By Neel 1906 days ago

bwa mem -t 40 -R '@RG\tID:K12\tSM:K12' \
    E.coli_K12_MG1655.fa SRR1770413_1.fastq.gz SRR1770413_2.fastq.gz \
    | samtools view -b - >SRR1770413.raw.bam
sambamba sort SRR1770413.raw.bam
sambamba markdup SRR1770413.raw.sorted.bam SRR1770413.bam


##Breaking it down by line:

#alignment with bwa: bwa mem -t $threads -R '@RG\tID:K12\tSM:K12' --- this says "align using so many threads" and also "give the reads the read group K12 and the sample name K12"
#reference and FASTQs E.coli_K12_MG1655.fa SRR1770413_1.fastq.gz SRR1770413_2.fastq.gz --- this just specifies the base reference file name (bwa finds the indexes using this) and the input alignment files. The first file should contain the first mate, the second file the second mate.
#conversion to BAM: samtools view -b - --- this reads SAM from stdin (the - specifier in place of the file name indicates this) and converts to BAM.
#sorting the BAM file: sambamba sort SRR1770413.raw.bam --- sort the BAM file, writing it to .sorted.bam.
#marking PCR duplicates: sambamba markdup SRR1770413.raw.sorted.bam SRR1770413.bam --- this marks reads which appear to be redundant PCR duplicates based on their read mapping position. It uses the same criteria for marking duplicates as picard.

minimap2 -ax sr -t 40 -R '@RG\tID:O104_H4\tSM:O104_H4' \
    E.coli_K12_MG1655.fa SRR341549_1.fastq.gz  SRR341549_2.fastq.gz \
    | samtools view -b - >SRR341549.raw.minimap2.bam
sambamba sort SRR341549.raw.minimap2.bam
sambamba markdup SRR341549.raw.sorted.minimap2.bam SRR341549.minimap2.bam

#The only major change from bwa mem is that we'll tell it we're working with short read data using -ax sr:

BOL

Neel

Our Sponsors

Bash script to alignment of short reads against reference genome !