Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Question: Question: How to filter out host reads from paired-end fastq files?

Neelam Jha
1462 days ago

Question: How to filter out host reads from paired-end fastq files?

How to filter out host reads from paired-end fastq files?

Answers
0

a) bowtie2 mapping against host sequence

Host example: human genome hg19 (download bowtie2 hg19 index)

# 1) create bowtie2 index database (host_DB) from host reference genome
bowtie2-build host_genome.fna host_DB

# 2) bowtie2 mapping against host sequence database, keep both mapped and unmapped reads (paired-end reads)
bowtie2 -x host_DB -1 SAMPLE_r1.fastq -2 SAMPLE_r2.fastq -S SAMPLE_mapped_and_unmapped.sam

# 3) convert file .sam to .bam
samtools view -bS SAMPLE_mapped_and_unmapped.sam > SAMPLE_mapped_and_unmapped.bam

b) filter required unmapped reads # SAMtools SAM-flag filter: get unmapped pairs (both ends unmapped)
samtools view -b -f 12 -F 256 SAMPLE_mapped_and_unmapped.bam > SAMPLE_bothEndsUnmapped.bam
-f 12     Extract only (-f) alignments with both reads unmapped: <read unmapped><mate unmapped>
-F 256   Do not(-F) extract alignments which are: <not primary alignment>
see meaning of SAM-flags
 
c)  split paired-end reads into separated fastq files .._r1 .._r2
# sort bam file by read name (-n) to have paired reads next to each other as required by bedtools
samtools sort -n SAMPLE_bothEndsUnmapped.bam SAMPLE_bothEndsUnmapped_sorted


bedtools bamtofastq
 -i SAMPLE_bothEndsUnmapped_sorted.bam -fq SAMPLE_host_removed_r1.fastq -fq2 SAMPLE_host_removed_r2.fastq