<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Owner]]></title>
	<link>https://bioinformaticsonline.com/snippets/owner/shruti?</link>
	<atom:link href="https://bioinformaticsonline.com/snippets/owner/shruti?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/43887/rename-the-screen-name-in-linux</guid>
	<pubDate>Mon, 30 May 2022 20:44:38 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/43887/rename-the-screen-name-in-linux</link>
	<title><![CDATA[Rename the screen name in Linux !]]></title>
	<description><![CDATA[<code>screen -S 22798.pts-5.hn1 -X sessionname node33_COMPUTE</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/43054/r-script-to-visualize-fastani-core-genome-comparison</guid>
	<pubDate>Fri, 30 Apr 2021 02:29:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/43054/r-script-to-visualize-fastani-core-genome-comparison</link>
	<title><![CDATA[R script to visualize fastANI core-genome comparison]]></title>
	<description><![CDATA[<code>#######
# Purpose: Visualize fastANI core-genome comparison
# Usage: Rscript &lt;this_script&gt;  &lt;query sequence in fasta format&gt; &lt;subject sequence&gt; &lt;fastANI visualization output&gt;
# Output: &lt;fastANI visualization output filename&gt;.pdf
# Uses genoPlotR package: http://genoplotr.r-forge.r-project.org

#Parse command line arguments
query_fasta=commandArgs(TRUE)[1]
subject_fasta=commandArgs(TRUE)[2]
fastANI_visual_file=commandArgs(TRUE)[3]

library(genoPlotR)

#Read fastANI output
comparison &lt;- try(read_comparison_from_blast(fastANI_visual_file))

#Read sequences into genoPlotR objects
Query &lt;- try(read_dna_seg_from_file(query_fasta))
Ref &lt;- try(read_dna_seg_from_file(subject_fasta))

plotTitle = paste(query_fasta, subject_fasta, sep=&quot; v/s &quot;)

pdf( paste(fastANI_visual_file,&quot;.pdf&quot;,sep=&quot;&quot;) )

plot_gene_map(dna_segs=list(Query, Ref), comparisons=list(comparison), main=plotTitle, scale=FALSE, scale_cex=1, n_scale_ticks=4)

dev.off()</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/41601/install-kraken-on-linux</guid>
	<pubDate>Mon, 04 May 2020 04:16:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/41601/install-kraken-on-linux</link>
	<title><![CDATA[Install kraken on linux]]></title>
	<description><![CDATA[<code>$ conda install -c bioconda kraken
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/anaconda3

  added / updated specs:
    - kraken


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    jellyfish-1.1.12           |       h6bb024c_1         3.1 MB  bioconda
    kraken-1.1                 |       h470a237_2         114 KB  bioconda
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be INSTALLED:

  jellyfish          bioconda/linux-64::jellyfish-1.1.12-h6bb024c_1
  kraken             bioconda/linux-64::kraken-1.1-h470a237_2
  perl               pkgs/main/linux-64::perl-5.26.2-h14c3975_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
kraken-1.1           | 114 KB    | ################################################## | 100% 
jellyfish-1.1.12     | 3.1 MB    | ################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

$conda install -c bioconda kraken-all
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/anaconda3

  added / updated specs:
    - kraken-all


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    kraken-0.10.6_eaf8fb68     |                2          98 KB  bioconda
    kraken-all-0.10.6_eaf8fb68 |                0          857 B  bioconda
    perl-threaded-5.26.0       |                0           4 KB  bioconda
    ------------------------------------------------------------
                                           Total:         103 KB

The following NEW packages will be INSTALLED:

  kraken-all         bioconda/linux-64::kraken-all-0.10.6_eaf8fb68-0
  perl-threaded      bioconda/linux-64::perl-threaded-5.26.0-0

The following packages will be SUPERSEDED by a higher-priority channel:

  kraken             bioconda/label/cf201901::kraken-1.1-h~ --&gt; bioconda::kraken-0.10.6_eaf8fb68-2


Proceed ([y]/n)? y


Downloading and Extracting Packages
kraken-0.10.6_eaf8fb | 98 KB     | ################################################## | 100% 
kraken-all-0.10.6_ea | 857 B     | ################################################## | 100% 
perl-threaded-5.26.0 | 4 KB      | ################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/40391/samtools-commands-for-bioinformatician</guid>
	<pubDate>Sun, 15 Dec 2019 10:31:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/40391/samtools-commands-for-bioinformatician</link>
	<title><![CDATA[Samtools commands for bioinformatician !]]></title>
	<description><![CDATA[<code>## count mapped reads
samtools view -c -F 260 mapping_file.bam


### converting sam file into fasta
samtools fasta reads_mapped.sam &gt; reads.fasta

### converting sam file into bam
# -b : output is bam
# -S : input is sam
# -o : output file name
samtools view -b -S -o sal_sej.bam sal_sej.sam

### viewing bam files (view command)
samtool view sal_sej.bam | less


### sort reads by flag specified and show them
# -f INT show flag matches
samtools -f 4 sal_sej.bam | less

# -F INT show reads excepting flag matches
samtools -F 4 sal_sej.bam | less


### count reads (-c) by flag specified
samtool view -c -f 4 sal_sej.bam


### count reads by quality value specified (-q) (&gt;=)
# -q : minimal quality value
samtools view -q 42 -c sal_sej.bam



### sorting bam file by genome position
samtools sort sal_sej.bam &gt; sal_sej_sorted.bam

### indexing sorted file
# ouput is file.bai
# always index sorted files
samtools index sal_sej_sorted.bam.bam

### identifying genome variants (mpileup command)
# -g : output is bcf (binary call format) file
# -f : use reference genome given
samtools mpileup -g -f sal_ref_sej.fasta sal_sej_sorted.bam.bam &gt; sal_vars.bcf


### calling snp and indels
# -c : find snp
# -v : output only potential variants
bcftools view -c -v sal_vars.bcf &gt; sal_vars.vcf
### note: new command is &quot;call&quot;. type &quot;bcftools&quot; to see current version and commands

### calling snp and indels with no frequency threshold
# omit -v parameter
bcftools call -c sal_vars.bcf &gt; sal_vars.vcf

### normalize (realign) indels
# -f : reference fasta, needed to left align and normalize
bcftools norm -f ./input_data/ref.fasta \
vars.vcf \
-o vars_indels_realigned.vcf

### show alignment
# first arg is sorted bam file
# second arg is reference 
samtools tview sal_sej_sorted.bam.bam sal_ref_sej.fasta

### results into txt file
samtools depth /path/to/sorted_bam.bam &gt; /path/to/coverage_results.txt
### note: returns count of depth at each position. input should be sorted

### count depth at each position and put it into a txt file
# -a : at all positions
samtools depth -a sorted_dupremoved.bam &gt; depth.txt

### one liner to count mean depth
samtools depth -a sorted_dupremoved.bam | awk &#039;{c++;s+=$3}END{print s/c}&#039;

### one liner to count coverage breadth
samtools depth -a sorted_dupremoved.bam | awk &#039;{c++; if($3&gt;0) total+=1}END{print (total/c)*100}&#039;

### flagstat
# output file has two columns: QC-passed reads and QC-failed reads
# and rows: total reads, duplicates, mapped.
samtools flagstat sorted_dupremoved.bam &gt; flagstat.txt

####################################
# try these recipies: 
### convert BAM to FASTA:
samtools view filename.bam | awk &#039;{OFS=&quot;\t&quot;; print &quot;&gt;&quot;$1&quot;\n&quot;$10}&#039; - &gt; filename.fasta

## converting BAM to SAM:
samtools view -h -o out.sam in.bam
## &quot;samtools view -h &lt;in.bam&gt; &gt; &lt;out.sam&gt;&quot;</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/40386/qv-calculation-in-bash</guid>
	<pubDate>Fri, 13 Dec 2019 21:14:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/40386/qv-calculation-in-bash</link>
	<title><![CDATA[QV calculation in Bash !]]></title>
	<description><![CDATA[<code># $1 = vcf file
# $2 = input bam file
# $3 = output QV file

module load samtools

NUM_BP=`samtools depth $2 | perl -e &#039;$c = 0; while(&lt;&gt;){chomp; @s = split(/\t/); if(scalar(@s) &gt;= 3){$c++;}} print &quot;$c\n&quot;;&#039;`
echo &quot;num bp: &quot;$NUM_BP

NUM_SNP=`cat $1 |grep -v &quot;#&quot; | awk -F &quot;\t&quot; &#039;{if (!match($NF, &quot;0/1&quot;)) print $1&quot;\t&quot;$2&quot;\t&quot;$3&quot;\t&quot;$4&quot;\t&quot;$5&quot;\t&quot;$8}&#039; | tr &#039;;&#039; &#039; &#039; | sed s/AB=//g | awk -v WEIGHT=0 &#039;{if ($6 &gt;= WEIGHT) print $0}&#039; | awk -v SUM=0 &#039;{if (length($4) == length($5)) { SUM+=length($4); } else if (length($4) &lt; length($5)) { SUM+=length($5)-length($4); } else { SUM+=length($4)-length($5)}} END { print SUM}&#039;`
echo &quot;num snp: &quot;$NUM_SNP

perl -e &#039;chomp(@ARGV); $ns = $ARGV[0]; $nb = $ARGV[1]; print (-10 * log($ns/$nb)/log(10)); print &quot;\n&quot;;&#039; $NUM_SNP $NUM_BP &gt; $3
cat $3</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36903/perl-script-to-find-coding-regions-in-dna-sequences</guid>
	<pubDate>Mon, 11 Jun 2018 08:03:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36903/perl-script-to-find-coding-regions-in-dna-sequences</link>
	<title><![CDATA[Perl script to find coding regions in DNA sequences]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl -w

use strict;


# if the number of input arguments is lower than 2
# return a message showing the error

if (scalar(@ARGV) &lt; 2) {
  print &quot;dnaloglkh.pl codontable DNAsequence\n&quot;;
  exit(1);
}

# create two vars contaning the filenames

my $filecodontable = $ARGV[0];
my $filesequence = $ARGV[1];


# open the first file: table of codon usage frequencies

if (!open(PROPCODONS,&quot;&lt; $filecodontable&quot;)) {
  print &quot;dnaloglkh.pl: the file $filecodontable can not be opened\n&quot;;
  exit(1);
}


# load the frequencies of codon usage into the hash table %pcodons
# this hash is indexed by a triplet of nucleotides or codon

my %pcodons;
my ($codon,$freq);
my $line;

# for each line in the file (tableofcodons) do

while ($line = ) {

	# extract the two values or columns with a regular expression
	# A group of letters and a decimal number
	
	($codon,$freq) = ($line =~ m/(\w+) (\d+\.\d+)/);
  
	# load the hash table
	$pcodons{$codon}=$freq;
}

close(PROPCODONS);

##########################################################
# open the DNA sequence (second file)

if (!open(SEQUENCE,&quot;&lt; $filesequence&quot;)) {
	print &quot;dnaloglkh.pl: the file $filesequence can not be opened\n&quot;;
	exit(1);
}


# FASTA format: 
# &gt;header containing a description of the sequence
# AGACTGACTTAC
# AGACTGACTTAC
# AGACTGAC

my $iden = ;  # reading the header of the sequence
my @seql = ;  # load the complete sequence into an array
my $seq  = &quot;&quot;;          # use a var to save the whole sequence inside

close(SEQUENCE);

# concat every segment (line) in the array into the var $seq

my $i=0;

while ($i &lt; scalar(@seql)) {
	chomp($seql[$i]);
	$seq = $seq . $seql[$i];
	$i = $i + 1;
}

$seq = &quot;\U$seq&quot;;

# we are going to use the array @codons to store all of the codons in
# the input DNA sequence. To split the sequence in segments of length
# three, a regular expression will be used searching for all of the
# possible groups of three symbols (option g)

my @codons = ($seq =~ m/.../g);

# the likelihood ratio is for a current triplet, the ratio between 
# the probability computed from the table of codon frequencies divided
# by the random probability (uniform), that is, 1/64. After this,
# the log must be applied to obtain the log-likelihood ratio.
# It is recommended the use of logs to manage calculations involving
# small numbers between 0 and 1. Log of the product is equal to the
# the sum of logs and the log of the division is the substraction of logs

my $r = 0.0;

$i=0;
while ($i &lt; scalar(@codons)) {
  $r = $r + log($pcodons{$codons[$i]}) - log(1/64);
  $i = $i + 1;
}

printf &quot;%.2f\n&quot;,$r;


__END__

To test the program, save this two files in your current working directory: exon.fa and intron.fa. Both files contain the sequence of a real exon and intron, respectively. Extremely different values of coding potential should be obtained with the two sequences: high values (positive) for coding protein sequences and low values (zero or even negative) for intronic regions. 

 exon.fa 

&gt;HUMHBB.2ex
GCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACT
CCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCT
TTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCT
GCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGG


 intron.fa.

&gt;HUMHBB.2pin
ATAACAATTGTTTTCTTTTGTTTAATTCTTGCTTTCTTTTTTTTTCTTCTCCGCAATTTTT
ACTATTATACTTAATGCCTTAACATTGTGTATAACAAAAGGAAATATCTCTGAGATACATT
AAGTAACTTAAAAAAAAACTTTACACAGTCTGCCTAGTACATTACTATTTGGAATATATGT
GTGCTTATTTGCATATTCATAATCTCCCTACTTTATTTTC</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/31216/extracting-fasta-sequences-based-on-position-with-perl-script</guid>
	<pubDate>Wed, 01 Mar 2017 17:10:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/31216/extracting-fasta-sequences-based-on-position-with-perl-script</link>
	<title><![CDATA[Extracting FASTA sequences based on position with perl script !!]]></title>
	<description><![CDATA[<code>#!/usr/bin/env perl
 
#Uses: perl sub-seq.pl input.txt range
 
use strict;
use warnings;
 
my $end   = pop;
my $start = pop;
local $/ = &#039;&gt;&#039;;
 
while (&lt;&gt;) {
    chomp;
    next unless /(.+)/;
    my ($header) = &quot;$/$1_$start-$end\n&quot;;
    my $seq = ${^POSTMATCH};
    $seq =~ s/\s//g;
    print $header;
    print +( substr $seq, $start - 1, $end ) . &quot;\n&quot;;
}</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/30650/perl-script-to-insert-the-dna-string-in-genome</guid>
	<pubDate>Mon, 23 Jan 2017 10:04:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/30650/perl-script-to-insert-the-dna-string-in-genome</link>
	<title><![CDATA[Perl script to insert the DNA string in genome]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl

use warnings;
use strict;
use Bio::SeqIO;
use Bio::Seq;

my $file = $ARGV[0]; # input fasta file (genome file)
my $out = $ARGV[1]; # output fasta file

my $chr=&quot;test&quot;; #insertion chromosome
my $pos=10; # position of the insertion
my $seqI = &quot;AAAA&quot;; #sequence of the insertion

my $seq_in  = Bio::SeqIO-&gt;new( -format =&gt; &#039;fasta&#039;,-file =&gt; $file);
my $seq_out = Bio::SeqIO-&gt;new( -format =&gt; &#039;fasta&#039;,-file =&gt; &quot;&gt;&quot;.$out);
while( my $seq = $seq_in-&gt;next_seq() ) {    
    if($seq-&gt;primary_id eq $chr){
        my $length = length($seq-&gt;seq);    
        my $upstream=substr($seq-&gt;seq, 0, $pos);
        my $downstream=substr($seq-&gt;seq, $pos,$length);        
        my $seq_obj = Bio::Seq-&gt;new(-seq =&gt; $upstream.$seqI.$downstream,-display_id =&gt; $seq-&gt;primary_id,-alphabet =&gt; &quot;dna&quot; );
            $seq_out-&gt;write_seq($seq_obj);
    }
    else{
        $seq_out-&gt;write_seq($seq);
    }
}</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/28429/blast-script-to-index-and-extract-sequence</guid>
	<pubDate>Thu, 14 Jul 2016 09:52:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/28429/blast-script-to-index-and-extract-sequence</link>
	<title><![CDATA[Blast script to index and extract sequence !!]]></title>
	<description><![CDATA[<code># look at the file
$ head EC4115.fa 
&gt;NC_011353.1 Escherichia coli O157:H7 str. EC4115 chromosome, complete genome.
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTC
TGACAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC

# generate the blast database
$ makeblastdb -dbtype nucl -out EC -in EC4115.fa -parse_seqids

# retreive an entry by id
$ blastdbcmd -db EC -entry &#039;NC_011353.1&#039; | head
&gt;lcl|NC_011353.1 Escherichia coli O157:H7 str. EC4115 chromosome, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGCTTCTGAACTG
GTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGAC
....

# query the blast database by id and coordinates
$ blastdbcmd -db EC -range 100-105 -entry &#039;NC_011353.1&#039;
&gt;lcl|NC_011353.1:100-105 Escherichia coli O157:H7 str. EC4115 chromosome, complete genome
TTAAAA</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>

</channel>
</rss>