<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: All site pages]]></title>
	<link>https://bioinformaticsonline.com/pages/all?offset=20</link>
	<atom:link href="https://bioinformaticsonline.com/pages/all?offset=20" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/40226/bioinformatics-training-courses-at-rasa-lsi</guid>
	<pubDate>Wed, 06 Nov 2019 00:30:51 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/40226/bioinformatics-training-courses-at-rasa-lsi</link>
	<title><![CDATA[Bioinformatics Training Courses At RASA LSI]]></title>
	<description><![CDATA[<p>RASA conducts comprehensive Life Science skill development training courses in Pune, India for working professionals, researchers, students and job-seeker. The trainings are crafted meticulously, covering different modules of courses such as Bioinformatics course, In silico Drug Discovery course, Next Generation Sequence data analysis course, Molecular Biology &amp; Life&nbsp;science software development course wherein you learn from industry leaders&nbsp;how to apply these skills in life science &amp; have a command over software developing process &nbsp;by using various methodologies. We conduct in-class training and instructor-led live online classes worldwide, along with corporate and skill development training worldwide.</p><p>Workshops are conducted in regular intervals on Drug Designing, Protein Modeling and Simulation, Chemoinformatics, Bioinformatics etc.The workshops are highly beneficial for working professionals, students, researcher for enhancements of the skills in short duration.</p>]]></description>
	<dc:creator>RASA Life Sciences</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37677/installing-blat-on-linux</guid>
	<pubDate>Tue, 11 Sep 2018 08:17:35 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37677/installing-blat-on-linux</link>
	<title><![CDATA[Installing BLAT on Linux !]]></title>
	<description><![CDATA[<p><span>It's been a while since I last installed BLAT and when I went to the download directory at UCSC:&nbsp;</span><a href="http://users.soe.ucsc.edu/~kent/src/">http://users.soe.ucsc.edu/~kent/src/</a><span>&nbsp;I found that the latest blast is now version 35 and that the code to download was:&nbsp;</span><a href="http://users.soe.ucsc.edu/~kent/src/blatSrc35.zip">blatSrc35.zip</a><span>. However, you can also get pre-compiled binaries at:&nbsp;</span><a href="http://hgdownload.cse.ucsc.edu/admin/exe/">http://hgdownload.cse.ucsc.edu/admin/exe/</a><span>&nbsp;and that there was a linux x86_64 executable for my architecture available at:&nbsp;</span><a href="http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/">http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/</a><span>. Though YYMV, BLAT can be a little bit of a tricky beast to get going, so I decided to download the source code and compile that.</span><br /><br /><span>I will be compiling this code as 'root' as a system tool in&nbsp;</span><code>/usr/local/src</code><span>, so do not scream at me for that.</span><br /><br /><span>First I created an /usr/local/src/blat directory and I copied the blatSrc35.zip file into that.</span><br /><br /><span>Next I used</span></p><pre><code>unzip blatSrc35.zip</code></pre><p><span>to unpack the archive. This gives a directory blatSrc now move into that directory.</span></p><pre><code>#cd blatSrc</code></pre><p><span>before you begin read the README file that comes with the source code.</span><br /><br /><span>One thing about building blat is that you need to set the MACHTYPE variable so that the BLAT sources know what type of machine you are compiling the software on.</span><br /><br /><span>on most *nix machines, typing</span></p><pre><code>echo $MACHTYPE</code></pre><p><span>will return the machine architecture type.</span><br /><br /><span>On my CentOS 6 based system this gave:</span></p><pre><code>x86_64-redhat-linux-gnu</code></pre><p><span>However, what BLAT requires is the 'short value' (ie the first part of the MACHTYPE). To correct this, in the bash shell type (change this to the correct MACHTYPE for your system)</span></p><pre><code>MACHTYPE=x86_64
export MACHTYPE</code></pre><p><span>now running the command:</span></p><pre><code>echo $MACHTYPE</code></pre><p><span>should give the correct short form of the MACHTYPE:</span></p><pre><code>x86_64</code></pre><p><span>now create the directory lib/$MACHTYPE in the source tree. ie:</span></p><pre><code>mkdir lib/$MACHTYPE</code></pre><p><span>For my machine, lib/x86_64 already existed, so I did not have to do this, but this is not the case for all architectures.</span><br /><br /><span>The BLAT code assumes that you are compiling BLAT as a non-privileged (ie non-root) user. As a result, you must create the directory for the executables to go into:</span><br /><br /><span>mkdir ~/bin/$MACHTYPE</span><br /><br /><span>If you are installing as a normal user, edit your .bashrc to add the following (change the x86_64 to be your MACHTYPE):</span><br /><br /><span>export PATH=~/bin/x86_64::$PATH</span><br /><br /><span>For me, though, this was not good enough. I wanted the executables in /usr/local/bin where all my other code goes. As a result I did some hackery...</span><br /><br /><span>There is a master make template in the&nbsp;</span><code>inc</code><span>&nbsp;directory called&nbsp;</span><code>common.mk</code><span>&nbsp;and I edited this file with the command:</span><br /><br /><span>vi inc/common.mk</span><br /><br /><span>I replaced the line</span></p><pre><code>    BINDIR=${HOME}/bin/${MACHTYPE}</code></pre><p><span>with</span></p><pre><code>    BINDIR=/usr/local/bin</code></pre><p><span>saved and quit (as this is in my path, I do not need to do anything else)</span><br /><br /><span>All the preparation is now done and you can create the blat executables by going into the toplevel of the blat source tree (for me it was&nbsp;</span><code>/usr/local/src/blat/blatSrc</code><span>, but change to wherever you unpacked blat into).</span><br /><br /><span>Now simply run the command:</span></p><pre><code>make</code></pre><p><span>to compile the code.</span><br /><br /><span>Blat installed cleanly and the executables were all neatly placed in /usr/local/bin/x86_64, just like I wanted.</span><br /><br /><span>now simply running the command:</span></p><pre><code>blat</code></pre><p><span>on the command line gives me information on blat and sample usage.</span><br /><br /><span>Blat is installed and it's installed properly in my system code tree!!!</span></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37592/benchmarking-perl-module</guid>
	<pubDate>Sat, 25 Aug 2018 11:40:42 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37592/benchmarking-perl-module</link>
	<title><![CDATA[Benchmarking Perl Module !]]></title>
	<description><![CDATA[<p>The benchmark module is a great tool to know the time the code takes to run. The output is usually in terms of CPU time. This module provides us with a way to optimize our code. With the advent of petascale computing and other multicore processor it is becoming a neccesity to know about the CPU time taken by our perl program.</p><p>This is the simple way to use the module</p><blockquote><p>Example1:</p><p>use Benchmark;</p><p>$first_time = Benchmark-&gt;new;</p><p>our code&hellip;&hellip;</p><p>$second_time = Benchmark-&gt;new;</p><p>$final_difference = timediff($first_time,$second_time);</p><p>print &ldquo;the code took, timestr($final_difference),&rdquo;\n&rdquo;;</p></blockquote><p>that was a very simple way to know the time diff , we can use it to know the time taken by some part of the code in the program.</p><blockquote><p>More sophisticated way:</p><p>use Benchmark;<br />sub first {</p><p>my(arguments) = @_;</p><p>}</p><p>timethese(100, { first =&gt; &lsquo;first_sub(arguments)&rsquo;});</p><p>The first argument to timethese is 100 (evaluate 100 times).</p></blockquote><p>Hope this very small tutorial with Benchmark will help people get started.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37514/list-of-non-commercial-ngs-genotype-calling-software</guid>
	<pubDate>Thu, 09 Aug 2018 04:21:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37514/list-of-non-commercial-ngs-genotype-calling-software</link>
	<title><![CDATA[List of non-commercial NGS genotype-calling software]]></title>
	<description><![CDATA[<p><span>Meaningful analysis of next-generation sequencing (NGS) data, which are produced extensively by genetics and genomics studies, relies crucially on the accurate calling of SNPs and genotypes. Recently developed statistical methods both improve and quantify the considerable uncertainty associated with genotype calling, and will especially benefit the growing number of studies using low- to medium-coverage data.&nbsp;</span></p><p><span>A list of programs for genotype and SNP calling :</span></p><p><br />SOAP2&nbsp;http://soap.genomics.org.cn/index.html</p><p>Single-sample High-quality variant database (for example, dbSNP) Package for NGS data analysis, which includes a single individual genotype caller (SOAPsnp)</p><p>realSFS&nbsp;http://128.32.118.212/thorfinn/realSFS/</p><p>Single-sample Aligned reads Software for SNP and genotype calling using single individuals and allele frequencies. Site frequency spectrum (SFS) estimation</p><p>Samtools http://samtools.sourceforge.net/</p><p>Multi-sample Aligned reads Package for manipulation of NGS alignments, which includes a computation of genotype likelihoods (samtools) and SNP and genotype calling (bcftools)</p><p>GATK http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit Multi-sample Aligned reads Package for aligned NGS data analysis, which includes a SNP and genotype caller (Unifed Genotyper), SNP filtering (Variant Filtration) and SNP quality recalibration (Variant Recalibrator)</p><p>Beagle http://faculty.washington.edu/browning/beagle/beagle.html</p><p>Multi-sample LD Candidate SNPs, genotype likelihoods Software for imputation, phasing and association that includes a mode for genotype calling</p><p>IMPUTE2 http://mathgen.stats.ox.ac.uk/impute/impute_v2.html</p><p>Multi-sample LD Candidate SNPs, genotype likelihoods Software for imputation and phasing, including a mode for genotype calling. Requires fine-scale linkage map</p><p>QCall ftp://ftp.sanger.ac.uk/pub/rd/QCALL</p><p>Multi-sample LD &lsquo;Feasible&rsquo; genealogies at a dense set of loci, genotype likelihoods Software for SNP and genotype calling, including a method for generating candidate SNPs without LD information (NLDA) and a method for incorporating LD information (LDA). The &lsquo;feasible&rsquo; genealogies can be generated using Margarita (http://www.sanger.ac.uk/resources/software/margarita)</p><p>MaCH http://genome.sph.umich.edu/wiki/Thunder</p><p>Multi-sample LD Genotype likelihoods Software for SNP and genotype calling, including a method (GPT_Freq) for generating candidate SNPs without LD information and a method (thunder_glf_freq) for incorporating LD information</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37411/my-commonly-used-commands-in-bioinformatics</guid>
	<pubDate>Thu, 26 Jul 2018 04:58:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37411/my-commonly-used-commands-in-bioinformatics</link>
	<title><![CDATA[My commonly used commands in Bioinformatics]]></title>
	<description><![CDATA[<p>FYI, I've found it useful to use MUMmer to extract the specific changes that Racon makes, so I can evaluate them individually:</p><pre><code>minimap -t 24 assembly.fasta long_reads.fastq.gz | racon -t 24 long_reads.fastq.gz - assembly.fasta racon_assembly.fasta
nucmer -p nucmer assembly.fasta racon_assembly.fasta
show-snps -C -T -r nucmer.delta
</code></pre><p>This reports Racon's changes in a table. You can exclude indels with the&nbsp;<code>-I</code>&nbsp;option in&nbsp;<code>show-snps</code>.&nbsp;</p><p>This process (Racon -&gt; MUMmer -&gt; SNP table) solves the problem I originally raised in this issue. So as far as I'm concerned, you can close this issue (or keep it open if you still want to implement some kind of variant table).</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37317/interview-puzzles-for-bioinformatician</guid>
	<pubDate>Tue, 17 Jul 2018 05:26:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37317/interview-puzzles-for-bioinformatician</link>
	<title><![CDATA[Interview Puzzles for Bioinformatician !]]></title>
	<description><![CDATA[<p>These are some of the most famous Interview Puzzles being asked in top tech companies.<br /><br />Here is a list of Top 25 puzzles which have been asked in top Tech Interview.</p><ol>
<li><span><a href="http://puzzlefry.com/puzzles/2-eggs-and-100-floor-google-classic-question/" target="_blank">2 Eggs and 100 Floor Classic Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/gold-coins-puzzle/" target="_blank">Five pirates and gold coin Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/gold-puzzle/" target="_blank">Six pirates and Gold Coin puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/probability-of-having-boy/" target="_blank">Probability of having boy</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/random-airplane-seats/" target="_blank">Random Airplane Seats</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/inverted-cards-puzzle/" target="_blank">Inverted playing card puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/flipping-coins/" target="_blank">Flipping Coins Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/three-hat-colors/" target="_blank">Three hat colors Microsoft Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/25-horses-5-tracks-puzzle/" target="_blank">25 horses 5 tracks Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/gold-bar-puzzle-2/" target="_blank">Gold Bar Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/crossing-the-bridge-puzzle/" target="_blank">Crossing the Bridge Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/interview-questions/" target="_blank">Will you accept the bet?</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/the-line-of-persons-with-hats/" target="_blank">The Puzzle of 100 Hats</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/how-many-days/" target="_blank">Man fell in Well Puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/minimum-number-of-weigths/" target="_blank">Minimum Number of Weigths</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/one-bulb-with-3-switches/" target="_blank">One Bulb with 3 Switches</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/find-the-minimum-number-of-aircraft/" target="_blank">Find the minimum number of aircraft</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/burning-ropes-to-measure-time/" target="_blank">Burning ropes to measure time</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/connect-3-houses-with-3-wells/" target="_blank">Connect 3 houses with 3 wells</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/measure-9-minutes-from-2-hourglasses-puzzle/" target="_blank">Measure 9 minutes from 2 hourglasses puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/ant-and-triangle-problem/" target="_blank">Ant and Triangle Problem</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/the-man-in-the-elevator/" target="_blank">The Man in the Elevator</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/find-the-survivor/" target="_blank">Find the survivor</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/free-the-prisoners-puzzle/" target="_blank">Free the prisoners puzzle</a></span></li>
<li><span><a href="http://puzzlefry.com/puzzles/great-strategy-can-only-save-life/" target="_blank">GREAT STRATEGY CAN ONLY SAVE LIFE</a></span></li>
</ol><p><br /><span>Specially for Microsoft Interview Puzzles, you may refer,</span><br /><span><a href="http://puzzlefry.com/2015/08/top-15-famous-microsoft-interview-puzzles/" target="_blank">Top 15 Microsoft Interview Puzzles</a></span><br /><span><a href="http://puzzlefry.com/qa-tag/microsoft-interview-puzzles/" target="_blank">Microsoft Interview Puzzles</a></span><br /><br /><span>Other MOST COMMON Interview Puzzles-</span><br /><span><a href="http://puzzlefry.com/2015/08/top-25-tech-interview-puzzles-with-answers/" target="_blank">Top 25 Tech Interview&nbsp;</a></span><span><a href="http://puzzlefry.com/2015/08/top-25-tech-interview-puzzles-with-answers/" target="_blank">Logical Puzzles</a></span><br /><br /><span>Each of the puzzles got repeated a number of times in interviews&nbsp;</span><span>even for top tech companies&nbsp;</span></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37198/understanding-blastn-output-format-6</guid>
	<pubDate>Wed, 27 Jun 2018 18:38:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37198/understanding-blastn-output-format-6</link>
	<title><![CDATA[Understanding BLASTn output format 6 !]]></title>
	<description><![CDATA[<h3 id="sites-page-title-header" style="text-align: left;"><span>BLASTn output format 6</span></h3><div id="sites-canvas-main"><div id="sites-canvas-main-content"><div dir="ltr"><div><div><em>BLASTn</em> maps DNA against DNA, for example gene sequences against a reference genome<br /><br /><code><strong>blastn</strong>  -query <span>genes.ffn</span>  -subject <span>genome.fna</span>  -outfmt <strong>6</strong></code></div><h2>BLASTn tabular output format 6</h2>
<p><strong>Column headers:</strong><br /><code>qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore</code><br /></p>
<table border="1" cellspacing="0">
<tbody>
<tr>
<td> 1.</td>
<td> qseqid</td>
<td> query (e.g., gene) sequence id</td>
</tr>
<tr>
<td> 2.</td>
<td> sseqid</td>
<td> subject (e.g., reference genome) sequence id</td>
</tr>
<tr>
<td> 3.</td>
<td> pident</td>
<td> percentage of identical matches</td>
</tr>
<tr>
<td> 4.</td>
<td> length</td>
<td> alignment length</td>
</tr>
<tr>
<td> 5.</td>
<td> mismatch</td>
<td> number of mismatches</td>
</tr>
<tr>
<td> 6.</td>
<td> gapopen</td>
<td> number of gap openings</td>
</tr>
<tr>
<td> 7.</td>
<td> qstart</td>
<td> start of alignment in query</td>
</tr>
<tr>
<td> 8.</td>
<td> qend</td>
<td> end of alignment in query</td>
</tr>
<tr>
<td> 9.</td>
<td> sstart</td>
<td> start of alignment in subject</td>
</tr>
<tr>
<td> 10.</td>
<td> send</td>
<td> end of alignment in subject</td>
</tr>
<tr>
<td> 11.</td>
<td> evalue</td>
<td> <a href="http://www.metagenomics.wiki/tools/blast/evalue">expect value</a></td>
</tr>
<tr>
<td> 12.</td>
<td> bitscore</td>
<td> <a href="http://www.metagenomics.wiki/tools/blast/evalue"><strong>bit score</strong></a></td>
</tr>
</tbody>
</table>
<p><strong><br /></strong></p>
</div><h2><a name="TOC-Define-your-own-output-format" id="TOC-Define-your-own-output-format"></a>Define your own output format</h2><div><em>by adding the option -outfmt, as for example: </em><strong><br /></strong></div>
<p><code><strong>-outfmt</strong> <strong>"6</strong> <span>qseqid sseqid pident qlen length mismatch gapope evalue bitscore</span><strong>"</strong></code><br /><br /><em><strong>supported format specifiers are:</strong></em><br /><code>qseqid    </code>Query Seq-id<br /><code>qgi       </code>Query GI<br /><code>qacc      </code>Query accesion<br /><code>qaccver   </code>Query accesion.version<br /><code>qlen      </code>Query sequence length<br /><code>sseqid    </code>Subject Seq-id<br /><code>sallseqid </code>All subject Seq-id(s), separated by a ';'<br /><code>sgi       </code>Subject GI<br /><code>sallgi    </code>All subject GIs<br /><code>sacc      </code>Subject accession<br /><code>saccver   </code>Subject accession.version<br /><code>sallacc   </code>All subject accessions<br /><code>slen      </code>Subject sequence length<br /><code>qstart    </code>Start of alignment in query<br /><code>qend      </code>End of alignment in query<br /><code>sstart    </code>Start of alignment in subject<br /><code>send      </code>End of alignment in subject<br /><code>qseq      </code>Aligned part of query sequence<br /><code>sseq      </code>Aligned part of subject sequence<br /><code>evalue    </code>Expect value<br /><code>bitscore  </code>Bit score<br /><code>score     </code>Raw score<br /><code>length    </code>Alignment length<br /><code>pident    </code>Percentage of identical matches<br /><code>nident    </code>Number of identical matches<br /><code>mismatch  </code>Number of mismatches<br /><code>positive  </code>Number of positive-scoring matches<br /><code>gapopen   </code>Number of gap openings<br /><code>gaps      </code>Total number of gaps<br /><code>ppos      </code>Percentage of positive-scoring matches<br /><code>frames    </code>Query and subject frames separated by a '/'<br /><code>qframe    </code>Query frame<br /><code>sframe    </code>Subject frame<br /><code>btop      </code>Blast traceback operations (BTOP)<br /><code>staxids   </code>Subject Taxonomy ID(s), separated by a ';'<br /><code>sscinames </code>Subject Scientific Name(s), separated by a ';'<br /><code>scomnames </code>Subject Common Name(s), separated by a ';'<br /><code>sblastnames </code>Subject Blast Name(s), separated by a ';'   (in alphabetical order)<br /><code>sskingdoms  </code>Subject Super Kingdom(s), separated by a ';'     (in alphabetical order) <br /><code>stitle      </code>Subject Title<br /><code>salltitles  </code>All Subject Title(s), separated by a '&lt;&gt;'<br /><code>sstrand   </code>Subject Strand<br /><code>qcovs     </code>Query Coverage Per Subject<br /><code>qcovhsp   </code>Query Coverage Per HSP<br /><strong><br /><em>default values are:</em></strong><br /><code><code>-outfmt "</code>6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"</code></p>
</div></div></div>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36960/links-scaffolder-bloomfilter-setting</guid>
	<pubDate>Fri, 15 Jun 2018 10:39:54 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36960/links-scaffolder-bloomfilter-setting</link>
	<title><![CDATA[LINKS scaffolder bloomfilter setting !]]></title>
	<description><![CDATA[
<p>➜  bin git:(master) ✗ ls -l<br />total 68<br />drwxrwxr-x 3 urbe urbe  4096 Jun 15 12:15 lib<br />-rwxrwxrwx 1 urbe urbe 65141 Jun 15 17:13 LINKS<br />➜  bin git:(master) ✗ pwd<br />/home/urbe/Tools/LINKS_1.8.6/bin</p>

<p>➜  bloomfilter git:(master) ✗ swig -Wall -c++ -perl5 BloomFilter.i<br />➜  bloomfilter git:(master) ✗ g++ -c BloomFilter_wrap.cxx -I/home/urbe/anaconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi/CORE/ -fPIC -Dbool=char -O3<br />BloomFilter_wrap.cxx:1892:30: fatal error: ../BloomFilter.hpp: No such file or directory<br />compilation terminated.<br />➜  bloomfilter git:(master) ✗ cd swig <br />➜  swig git:(master) ✗ g++ -c BloomFilter_wrap.cxx -I/home/urbe/anaconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi/CORE/ -fPIC -Dbool=char -O3<br />In file included from BloomFilter_wrap.cxx:1877:0:<br />../BloomFilter.hpp: In member function ‘void BloomFilter::loadHeader(FILE*)’:<br />../BloomFilter.hpp:141:59: warning: ignoring return value of ‘size_t fread(void*, size_t, size_t, FILE*)’, declared with attribute warn_unused_result [-Wunused-result]<br />         fread(&amp;header, sizeof(struct FileHeader), 1, file);<br />                                                           ^<br />➜  swig git:(master) ✗ g++ -Wall -shared BloomFilter_wrap.o -o BloomFilter.so -O3<br />➜  swig git:(master) ✗ cd ..<br />➜  bloomfilter git:(master) ✗ cd ..<br />➜  lib git:(master) ✗ cd ..<br />➜  bin git:(master) ✗ ./LINKS  <br />Usage: ./LINKS [v1.8.6]<br />-f  sequences to scaffold (Multi-FASTA format, required)<br />-s  file-of-filenames, full path to long sequence reads or MPET pairs [see below] (Multi-FASTA/fastq format, required)<br />-m  MPET reads (default -m 1 = yes, default = no, optional)<br />	! DO NOT SET IF NOT USING MPET. WHEN SET, LINKS WILL EXPECT A SPECIAL FORMAT UNDER -s<br />	! Paired MPET reads in their original outward orientation &lt;- -&gt; must be separated by ":"<br />	  &gt;template_name<br />	  ACGACACTATGCATAAGCAGACGAGCAGCGACGCAGCACG:ATATATAGCGCACGACGCAGCACAGCAGCAGACGAC<br />-d  distance between k-mer pairs (ie. target distances to re-scaffold on. default -d 4000, optional)<br />	Multiple distances are separated by comma. eg. -d 500,1000,2000,3000<br />-k  k-mer value (default -k 15, optional)<br />-t  step of sliding window when extracting k-mer pairs from long reads (default -t 2, optional)<br />	Multiple steps are separated by comma. eg. -t 10,5<br />-o  offset position for extracting k-mer pairs (default -o 0, optional)<br />-e  error (%) allowed on -d distance   e.g. -e 0.1  == distance +/- 10% (default -e 0.1, optional)<br />-l  minimum number of links (k-mer pairs) to compute scaffold (default -l 5, optional)<br />-a  maximum link ratio between two best contig pairs (default -a 0.3, optional)<br />	 *higher values lead to least accurate scaffolding*<br />-z  minimum contig length to consider for scaffolding (default -z 500, optional)<br />-b  base name for your output files (optional)<br />-r  Bloom filter input file for sequences supplied in -s (optional, if none provided will output to .bloom)<br />	 NOTE: BLOOM FILTER MUST BE DERIVED FROM THE SAME FILE SUPPLIED IN -f WITH SAME -k VALUE<br />	 IF YOU DO NOT SUPPLY A BLOOM FILTER, ONE WILL BE CREATED (.bloom)<br />-p  Bloom filter false positive rate (default -p 0.001, optional; increase to prevent memory allocation errors)<br />-x  Turn off Bloom filter functionality (-x 1 = yes, default = no, optional)<br />-v  Runs in verbose mode (-v 1 = yes, default = no, optional)</p>

<p>Error: Missing mandatory options -f and -s.</p>

<p>ERROR fixed</p>

<p>perl: symbol lookup error: /home/urbe/Tools/LINKS_new/bin/./lib/bloomfilter/swig/BloomFilter.so: undefined symbol: Perl_Gthr_key_ptr</p>
]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36842/gap-filling-or-contigs-extensions-tools</guid>
	<pubDate>Fri, 01 Jun 2018 08:07:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36842/gap-filling-or-contigs-extensions-tools</link>
	<title><![CDATA[Gap filling or Contigs extensions tools !]]></title>
	<description><![CDATA[
<p>There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like GAPresolution that can help to perform local re-assemblies using 454 reads. We used GAPresolution but it is not a very good software, it is useful only in some specific situations.</p>

<p>Take a look at the PRICE software from the DeRisi lab. Its meant to do something very similar. http://derisilab.ucsf.edu/index.php?page=software</p>

<p>You could also look at SSPACE (http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/sspacev12/), ATLAS tools (http://www.hgsc.bcm.tmc.edu/content/bcm-hgsc-software), and SCARPA (http://compbio.cs.toronto.edu/hapsembler/scarpa.html).</p>

<p>See the PAGIT protocol: http://www.sanger.ac.uk/resources/software/pagit/ </p>

<p>In particular, take a look at the IMAGE tool: http://genomebiology.com/2010/11/4/R41 </p>

<p>Also SOAPdenovo has ha function for scaffolding. Not sure about ABYSS</p>

<p>Here there is a useful explanation of several tools.</p>

<p>https://bioinformaticsonline.com/search?q=scaffolding&amp;entity_type=object&amp;entity_subtype=bookmarks&amp;offset=0&amp;search_type=entities</p>

<p>I could be wrong, but the above answers to your hypothetical scenario appear to miss the point that you aren't interested in assembling the full genome, just the 100 kb part you're interested in. I suggest the following algorithm:</p>

<p>1. Start with the initial assembly C0 of the contigs you have identified as overlapping your region of interest, and the set S of reads those contigs contain. Let C = C0.</p>

<p>2. Repeat:<br />a. Identify paired-end reads (not in C) for which one or both ends align within, or extending, contigs in C.<br />b. Identify unpaired reads that align extending these new paired-end reads.<br />c. Construct a new assembly C' from C and the new reads identified in (a) and (b).<br />d. Trim C' so it does not extend more than 100 kb to either end of C0. Set C = C'.<br />e. Let S' denote the reads that contribute to C'. If S' does not contain any reads not present in S, stop. Otherwise, Set S = S'.</p>

<p>3. If you don't have a complete assembly of the region of interest, generate an STS for each end of each contig, probe a library for clones including these STSes, subclone these clones into a paired-end sequencing vector, and generate paired-end reads for this library; then try steps (1) and (2) again, adding these new sequencing reads to what you had before.</p>

<p>4. If your average sequencing depth for the region of interest exceeds 25 or so without filling all gaps, it is likely that the remaining gaps represent sequences that are not getting cloned in your sequencing vectors. Try different sequencing vectors.</p>
]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>