<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: All]]></title>
	<link>https://bioinformaticsonline.com/snippets?offset=340</link>
	<atom:link href="https://bioinformaticsonline.com/snippets?offset=340" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35994/perl-script-to-find-missing-and-move-to-desire-folder</guid>
	<pubDate>Mon, 19 Mar 2018 12:35:54 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35994/perl-script-to-find-missing-and-move-to-desire-folder</link>
	<title><![CDATA[Perl script to find missing and move to desire folder]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl -w
use strict;
use warnings;
open(my $ids,  &quot;&lt;&quot;,  &quot;$ARGV[0]&quot;)  or die &quot;Can&#039;t open input.txt: $!&quot;;
while (&lt;$ids&gt;) {
chomp;
next if $_ =~ /^\s*$/;
my $id = $_;
	open(my $val,  &quot;&lt;&quot;,  &quot;$ARGV[1]&quot;)  or die &quot;Can&#039;t open input.txt: $!&quot;;

	while (&lt;$val&gt;)  {
	chomp;
	if (/$id/) {
	  print &quot;found string $id\n&quot;;
	system (&quot;cp $_ 2move&quot;); 
	}

}


}</code>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35993/perl-script-to-extract-the-uniq-ids</guid>
	<pubDate>Mon, 19 Mar 2018 12:34:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35993/perl-script-to-extract-the-uniq-ids</link>
	<title><![CDATA[Perl script to extract the uniq Ids]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl -w
use strict;
use warnings;
use List::Uniq &#039;:all&#039;;

open(my $val,  &quot;&lt;&quot;,  &quot;$ARGV[0]&quot;)  or die &quot;Can&#039;t open input.txt: $!&quot;;
my @allMissed;
while (&lt;$val&gt;)  {
chomp;
my $flag=0;
next if $_ =~ /^\s*$/;
my $string = &quot;$_&quot;;

open(my $file,  &quot;&lt;&quot;,  &quot;$ARGV[1]&quot;)  or die &quot;Can&#039;t open input.txt: $!&quot;;
while (&lt;$file&gt;)  {
	chomp;
	if (/$string/) {
	  #print &quot;$string\n&quot;; 
	$flag=1;
	}
}
close $file;
	if ($flag == 0) {
	push @allMissed, $string;
	}
}

close $val;

my @allRemaining = uniq (@allMissed);
foreach (@allRemaining) { print &quot;$_\n&quot;;}

__END__</code>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35910/estimate-genome-size-with-jellyfish-and-r</guid>
	<pubDate>Mon, 12 Mar 2018 10:11:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35910/estimate-genome-size-with-jellyfish-and-r</link>
	<title><![CDATA[Estimate Genome Size with Jellyfish and R]]></title>
	<description><![CDATA[<code>jellyfish count -t 8 -C -m 19 -s 5G -o 19mer_out --min-qual-char=? /common/Tutorial/Genome_estimation/sample_read_1.fastq /common/Tutorial/Genome_estimation/sample_read_2.fastq

#-t    -treads=unit32       Number of treads to be used in the run. eg: 1,2,3,..etc.
#-C    -both-strands        Count both strands
#-m    -mer-len=unit32      Length of the k-mer    
#-s    -size=unit32         Hash size / memory allocation  
#-o    -output=string       Output file name
#--min-quality-char         Base quality value. Version 2.2.3 of Jellyfish uses the “Phred” score, where &quot;?&quot; = 30

jellyfish histo -o 19mer_out.histo 19mer_out

#Plot
dataframe19 &lt;- read.table(&quot;19mer_out.histo&quot;) #load the data into dataframe19
plot(dataframe19[1:200,], type=&quot;l&quot;) #plots the data points 1 through 200 in the dataframe19 using a line

plot(dataframe19[2:200,], type=&quot;l&quot;)

plot(dataframe19[2:100,], type=&quot;l&quot;) #plot line graph 
points(dataframe19[2:100,]) #plot the data points from 2 through 100

sum(as.numeric(dataframe19[2:9325,1]*dataframe19[2:9325,2]))

data[10:20,]

sum(as.numeric(dataframe19[2:9325,1]*dataframe19[2:9325,2]))/12

#Return around ~ 305 Mb</code>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35904/plot-custom-gene-density-with-r</guid>
	<pubDate>Thu, 08 Mar 2018 16:30:34 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35904/plot-custom-gene-density-with-r</link>
	<title><![CDATA[Plot custom gene density with R]]></title>
	<description><![CDATA[<code>library(karyoploteR)

pp &lt;- getDefaultPlotParams(plot.type=2)
pp$data1outmargin &lt;- 100
pp$data2outmargin &lt;- 100
pp$topmargin &lt;- 450

gff.file &lt;- &quot;http://plasmodb.org/common/downloads/Current_Release/PvivaxP01/gff/data/PlasmoDB-35_PvivaxP01.gff&quot;
data.points.colors &lt;- c(&quot;#FFBD07&quot;, &quot;#00A6ED&quot;,  &quot;#FF3366&quot;, &quot;#8EA604&quot;, &quot;#C200FB&quot;)

header.lines &lt;- readLines(gff.file, n = 30)
## Error in file(con, &quot;r&quot;): cannot open the connection to &#039;http://plasmodb.org/common/downloads/Current_Release/PvivaxP01/gff/data/PlasmoDB-33_PvivaxP01.gff&#039;
#The lines with the standard chromosomes start with &quot;##sequence-region PvP01&quot;.
#Select them.
ll &lt;- header.lines[grepl(header.lines, pattern = &quot;##sequence-region PvP01&quot;)]
## Error in eval(expr, envir, enclos): object &#039;header.lines&#039; not found
#split them by space, and create a data.frame
gg &lt;- data.frame(do.call(rbind, strsplit(ll, split = &quot; &quot;)))
## Error in strsplit(ll, split = &quot; &quot;): object &#039;ll&#039; not found
gg[,3] &lt;- as.numeric(as.character(gg[,3]))
## Error in eval(expr, envir, enclos): object &#039;gg&#039; not found
gg[,4] &lt;- as.numeric(as.character(gg[,4]))
## Error in eval(expr, envir, enclos): object &#039;gg&#039; not found
#and create a GRanges with the information
PvP01.genome &lt;- toGRanges(gg[,c(2,3,4)])
## Error in is(A, &quot;GRanges&quot;): object &#039;gg&#039; not found
PvP01.genome
## Error in eval(expr, envir, enclos): object &#039;PvP01.genome&#039; not found

#kp &lt;- plotKaryotype(genome=PvP01.genome)
#kp &lt;- plotKaryotype(genome=PvP01.genome, ideogram.plotter = NULL, plot.type=2, plot.params = pp)
kp &lt;- plotKaryotype(genome=PvP01.genome, plot.type=2, plot.params = pp)

#kp &lt;- plotKaryotype(genome=PvP01.genome, ideogram.plotter = NULL)

#kpAddCytobandsAsLine(kp)

features &lt;- import(gff.file)

table(features$type)

genes &lt;- features[features$type==&quot;gene&quot;]

#kp &lt;- plotKaryotype(genome=PvP01.genome, ideogram.plotter = NULL)

#kpAddCytobandsAsLine(kp)

#kpPlotRegions(kp, data=genes)

#kp &lt;- plotKaryotype(genome=PvP01.genome, ideogram.plotter = NULL, plot.type=2)
kpAddMainTitle(kp, &quot;Plasmodium Vivax - PvP01 with genes&quot;, cex=2)


kpPlotRegions(kp, data=genes[strand(genes)==&quot;+&quot;], avoid.overlapping = FALSE, col=&quot;deepskyblue&quot;)

kpPlotRegions(kp, data=genes[strand(genes)==&quot;-&quot;], avoid.overlapping = FALSE, col=&quot;gold&quot;, data.panel=2)

kpAddLabels(kp, &quot;strand +&quot;, cex=0.8, col=&quot;#888888&quot;)

kpAddLabels(kp, &quot;strand -&quot;, data.panel=2, cex=0.8, col=&quot;#888888&quot;)



#plot all
pp &lt;- getDefaultPlotParams(plot.type = 4)
pp$data1inmargin &lt;- 0
pp$bottommargin &lt;- 20

kp &lt;- plotKaryotype(genome=PvP01.genome, plot.type=4, ideogram.plotter = NULL,
                    labels.plotter = NULL, plot.params = pp,
                    main=&quot;Gene Density&quot;)
kpAddCytobandsAsLine(kp)
kpAddChromosomeNames(kp, srt=45)
kpPlotDensity(kp, genes, window.size = 10e2, col=&quot;#ddaacc&quot;)



#plot all
pp &lt;- getDefaultPlotParams(plot.type = 2)
pp$data1inmargin &lt;- 0
pp$bottommargin &lt;- 20

kp &lt;- plotKaryotype(genome=PvP01.genome, plot.params = pp)
#kp &lt;- kpPlotDensity(kp, genes)

kpAddChromosomeNames(kp, srt=45)
kpPlotDensity(kp, genes, col=&quot;#ddaacc&quot;, data.panel = 1)

kpAbline(kp, h=0.4, data.panel = 2, r0=0.2, r1=0, col=data.points.colors[3])</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35890/perl-script-to-convert-fastq-to-fasta-file</guid>
	<pubDate>Wed, 07 Mar 2018 04:23:26 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35890/perl-script-to-convert-fastq-to-fasta-file</link>
	<title><![CDATA[Perl script to convert fastq to fasta file]]></title>
	<description><![CDATA[<code>#!/usr/bin/env perl

use strict;
use warnings;
use Bio::Factory::EMBOSS;

my $usage   = &quot;perl $0 in.fq out.fa&quot;;
my $infile  = shift or die $usage;
my $outfile = shift or die $usage;

my $factory = Bio::Factory::EMBOSS-&gt;new;
my $seqret  = $factory-&gt;program(&#039;seqret&#039;); # $seqret is a Bio::Tools::Run::EMBOSSApplication object

$seqret-&gt;run({-sequence =&gt; $infile,
              -sformat1 =&gt; &#039;fastq&#039;,
              -outseq   =&gt; $outfile,
              -osformat =&gt; &#039;fasta&#039;});</code>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35889/perl-script-to-remove-fasta-sequences-in-multifasta-file-with-certain-length-threshold</guid>
	<pubDate>Wed, 07 Mar 2018 04:16:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35889/perl-script-to-remove-fasta-sequences-in-multifasta-file-with-certain-length-threshold</link>
	<title><![CDATA[Perl script to remove fasta sequences in multifasta file with certain length threshold]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl
use strict;
use warnings;
 
my $minlen = shift or die &quot;Error: `minlen` parameter not provided\n&quot;;
{
    local $/=&quot;&gt;&quot;;
    while(&lt;&gt;) {
        chomp;
        next unless /\w/;
        s/&gt;$//gs;
        my @chunk = split /\n/;
        my $header = shift @chunk;
        my $seqlen = length join &quot;&quot;, @chunk;
        print &quot;&gt;$_&quot; if($seqlen &gt;= $minlen);
    }
    local $/=&quot;\n&quot;;
}</code>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35872/plot-dotplot-with-last</guid>
	<pubDate>Tue, 06 Mar 2018 09:21:18 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35872/plot-dotplot-with-last</link>
	<title><![CDATA[Plot dotplot with last !]]></title>
	<description><![CDATA[<code># generate dotplot
lastdb test/ref.fa
lastal -f TAB test/ref.fa test/contigs.reduced.pacbio.fa | last-dotplot - test/contigs.reduced.pacbio.fa.ref.png
lastal -f TAB test/ref.fa test/contigs.reduced.nanopore.fa | last-dotplot - test/contigs.reduced.nanopore.fa.ref.png</code>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35758/download-genomes-in-batch-from-ncbi</guid>
	<pubDate>Fri, 23 Feb 2018 08:52:03 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35758/download-genomes-in-batch-from-ncbi</link>
	<title><![CDATA[Download genomes in batch from NCBI]]></title>
	<description><![CDATA[<code>curl &#039;ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt&#039; | awk &#039;{FS=&quot;\t&quot;} !/^#/ {print $20}&#039; | sed -r &#039;s|(ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)(GCA/)([0-9]{3}/)([0-9]{3}/)([0-9]{3}/)(GCA_.+)|\1\2\3\4\5\6/\6_genomic.fna.gz|&#039; &gt; genomic_file</code>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35642/estimate-genome-size</guid>
	<pubDate>Thu, 22 Feb 2018 03:28:26 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35642/estimate-genome-size</link>
	<title><![CDATA[Estimate Genome Size]]></title>
	<description><![CDATA[<code># Count k-mer occurrence using Jellyfish 2.2.6
jellyfish count -t 8 -C -m 19 -s 5G -o 19mer_out --min-qual-char=? sread_1.fastq sread_2.fastq

# points for a histogram
jellyfish histo -o 19mer_out.histo 19mer_out

#Plot results using R
##load the data into dataframe19
dataframe19 &lt;- read.table(&quot;19mer_out.histo&quot;) 
##plots the data points 1 through 200 in the dataframe19 using a line
plot(dataframe19[1:200,], type=&quot;l&quot;)
##plot the data points from 2 through 100
points(dataframe19[2:100,])

#calculate the total k-mers in the distribution
#Assuming the total number of data points is 9325
sum(as.numeric(dataframe19[2:9325,1]*dataframe19[2:9325,2]))

#peak position and genome size
#plotted graph we can get an idea where the peak position lies
#see actual point 
data[10:20,] #If peak more likely to be between 10-20

#If 12 is the peak
sum(as.numeric(dataframe19[2:9325,1]*dataframe19[2:9325,2]))/12

#Compare the peak shape with Poisson distribution in R
singleC &lt;- sum(as.numeric(dataframe19[2:28,1]*dataframe19[2:28,2]))/12
poisdtb &lt;- dpois(1:100,12)*singleC
plot(poisdtb, type=&#039;l&#039;, lty=2, col=&quot;green&quot;)
lines(dataframe19[1:100,12] * singleC, type = &quot;l&quot;, col=3)#, Ity=2)
lines(dataframe19[1:100,],type= &quot;l&quot;)


#ALTERNATE  WAY
#https://github.com/dib-lab/khmer/blob/master/scripts/normalize-by-median.py
python normalize-by-median.py -x 1e8 -k 20 -C 20 -R report.txt reads.fa
#https://github.com/dib-lab/khmer-recipes/blob/master/003-estimate-genome-size/estimate-genome-size.py
python estimate-genome-size.py -C 20 -k 20 reads.fa.keep report.txt</code>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/35417/plot-the-density-of-genes-in-r</guid>
	<pubDate>Fri, 02 Feb 2018 03:19:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/35417/plot-the-density-of-genes-in-r</link>
	<title><![CDATA[Plot the density of genes in R]]></title>
	<description><![CDATA[<code>#column1 = chromosome name and column2 = start position of the gene

# check if ggplot2 is installed, if so, load it, 
# if not, install and load it
if(&quot;ggplot2&quot; %in% rownames(installed.packages())){
    library(ggplot2)
} else {
    install.packages(&quot;ggplot2&quot;)
    library(ggplot2)
}

# import a text file with gene positions
# columns should be: chr, position (no end or gene name required)
genes &lt;- read.table(&quot;genes.txt&quot;,sep=&quot;\t&quot;,header=T)

# make sure the chromosomes are ordered in the way you want
# them to appear in the plot
genes$chr &lt;- with(genes, factor(chr, levels=paste(&quot;chr&quot;,c(1:22,&quot;X&quot;,&quot;Y&quot;),sep=&quot;&quot;), ordered=TRUE))

# make a density plot of genes over the provided chromosomes (or scaffolds ...)
plottedGenes &lt;- ggplot(genes) + geom_histogram(aes(x=pos),binwidth=1000000) + facet_wrap(~chr,ncol=2) + ggtitle(&quot;RefSeq genes density over human genome 19&quot;) + xlab(&quot;Genomic position (bins 1 Mb)&quot;) + ylab(&quot;Number of genes&quot;)

# save it to an image
png(&quot;genes.png&quot;,width=1000,height=1500)
print(plottedGenes)
dev.off()</code>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>

</channel>
</rss>