X BOL wishing you a very and Happy New year

Alternative content

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries: Revision

BBSplit internally uses BBMap to map reads to multiple genomes at once, and determine which genome they match best. This is different than with ordinary mapping. If a genome (say, human) contains an exact repeat somewhere, reads mapping to it will be mapped ambiguously. But if you want to determine whether reads are mouse or human, it does not matter whether they map ambiguously within human, only whether they are ambiguous between human and mouse. BBSplit tracks this additional ambiguity information and decides how to use it based on the “ambig2” flag. The normal use of BBSplit is like Seal, either quantifying how many reads go to each reference, or splitting the reads into multiple output files, one per reference. BBSplit can only be run using references indexed with BBSplit, as they contain additional information regarding which sequences came from which reference file.

BBSplit is a tool that bins reads by mapping to multiple references simultaneously, using BBMap. The reads go to the bin of the reference they map to best. There are also disambiguation options, such that reads that map to multiple references can be binned with all of them, none of them, one of them, or put in a special "ambiguous" file for each of them. Paired reads will always be kept together.

For example, if you had a library of something that was contaminated with e.coli and salmonella, you could do this:

bbsplit.sh in=reads.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu=clean.fq int=t

This will produce 3 output files:
out_ecoli.fq (ecoli reads)
out_salmonella.fq (salmonella reads)
clean.fq (unmapped reads)

In this case, "int=t" means that the input file is paired and interleaved. For single-end reads you would leave that out. For paired reads in 2 files, you would do this:
bbsplit.sh in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu1=clean1.fq outu2=clean2.fq

BBSplit is available here:
https://sourceforge.net/projects/bbmap/

The sensitivity can be raised to be equivalent to BBMap with these flags: "minratio=0.56 minhits=1 maxindel=16000"