<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: All]]></title>
	<link>https://bioinformaticsonline.com/snippets?offset=330</link>
	<atom:link href="https://bioinformaticsonline.com/snippets?offset=330" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36933/mapping-with-bwa-mem-or-bwa-sampe-in-one-go-with-python-script</guid>
	<pubDate>Thu, 14 Jun 2018 06:37:37 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36933/mapping-with-bwa-mem-or-bwa-sampe-in-one-go-with-python-script</link>
	<title><![CDATA[Mapping with BWA-mem or BWA-sampe in one go with python script !]]></title>
	<description><![CDATA[<code>BAM files and mapping
BESST requires sorted and indexed BAM files as input. Any read aligner + samtools can be used to obtain such files. Read pairs needs to be aligned in paired read mode. BESST provides a script (https://github.com/ksahlin/BESST/blob/master/scripts/reads_to_ctg_map.py) for obtaining sorted and indexed BAM files with BWA-mem or BWA-sampe in one go. An example call for mapping with this script is

python reads_to_ctg_map.py /path/to/lib1_A.fq /path/to/lib1_B.fq /path/to/contigs.fasta --threads N
where N is an integer specifying how many threads BWA-mem should use. --nomem can be specified to the above call to use BWA-sampe as the paired read alignment pipeline.

https://github.com/ksahlin/BESST/blob/master/scripts/reads_to_ctg_map.py</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36916/perl-script-to-check-fastq-reads-qualities</guid>
	<pubDate>Tue, 12 Jun 2018 04:42:20 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36916/perl-script-to-check-fastq-reads-qualities</link>
	<title><![CDATA[Perl script to check fastq reads qualities !]]></title>
	<description><![CDATA[<code>#!/usr/bin/env perl

use strict;
use warnings;

sub readfq {
	my ($fh, $aux) = @_;
	@$aux = [undef, 0] if (!defined(@$aux));
	return if ($aux-&gt;[1]);
	if (!defined($aux-&gt;[0])) {
		while (&lt;$fh&gt;) {
			chomp;
			if (substr($_, 0, 1) eq &#039;&gt;&#039; || substr($_, 0, 1) eq &#039;@&#039;) {
				$aux-&gt;[0] = $_;
				last;
			}
		}
		if (!defined($aux-&gt;[0])) {
			$aux-&gt;[1] = 1;
			return;
		}
	}
	my $name = /^.(\S+)/? $1 : &#039;&#039;;
	my $seq = &#039;&#039;;
	my $c;
	$aux-&gt;[0] = undef;
	while (&lt;$fh&gt;) {
		chomp;
		$c = substr($_, 0, 1);
		last if ($c eq &#039;&gt;&#039; || $c eq &#039;@&#039; || $c eq &#039;+&#039;);
		$seq .= $_;
	}
	$aux-&gt;[0] = $_;
	$aux-&gt;[1] = 1 if (!defined($aux-&gt;[0]));
	return ($name, $seq) if ($c ne &#039;+&#039;);
	my $qual = &#039;&#039;;
	while (&lt;$fh&gt;) {
		chomp;
		$qual .= $_;
		if (length($qual) &gt;= length($seq)) {
			$aux-&gt;[0] = undef;
			return ($name, $seq, $qual);
		}
	}
	$aux-&gt;[1] = 1;
	return ($name, $seq);
}

my @aux = undef;
my ($name, $seq, $qual);
my ($n, $slen, $qlen) = (0, 0, 0);
while (($name, $seq, $qual) = readfq(\*STDIN, \@aux)) {
	++$n;
	$slen += length($seq);
	$qlen += length($qual) if ($qual);
print join(&quot;\t&quot;, $n, $slen, $qlen), &quot;\n&quot;;
}
#print join(&quot;\t&quot;, $n, $slen, $qlen), &quot;\n&quot;;

__END__
 my @aux = undef; # this is for keeping intermediate data
  while (my ($name, $seq, $qual) = readfq(\*STDIN, \@aux)) { 
     if( (length($seq) &gt;= 21) &amp;&amp; (length($seq) &lt;= 25) ) { 
         print &quot;@$name\n&quot;;
         print &quot;$seq\n&quot;; 
         print &quot;+\n&quot;;
         print &quot;$qual\n&quot;;
     }
  }</code>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36915/perl-script-to-find-palindromic-regions-in-dna-sequences</guid>
	<pubDate>Tue, 12 Jun 2018 04:34:12 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36915/perl-script-to-find-palindromic-regions-in-dna-sequences</link>
	<title><![CDATA[Perl script to find palindromic regions in DNA sequences]]></title>
	<description><![CDATA[<code>use strict;
use warnings;

my $pp = qr/(?: (\w) (?1) \g{-1} | \w? )/ix;
my $filename = $ARGV[0];
open(my $fh, &#039;&lt;:encoding(UTF-8)&#039;, $filename) or die &quot;Could not open file &#039;$filename&#039; $!&quot;;

local $/ = &#039;&#039;;

while (&lt;$fh&gt;) {
    chomp;
    my ($header, @lines) = split &quot;\n&quot;;
    my $data = join &#039;&#039;, @lines;

    print &quot;$header\n$data\n&quot;;

    while ($data =~ /(?=($pp))/g) {
	my $end=($-[0]+length($1));
	my $n=(length($1)/2);
	my $len=length($1);
	my $midPoint = ($n == int $n) ? $n : int($n + 1);
	$midPoint=$midPoint+$-[0];
        print &quot;$-[0]\t$midPoint\t$end\t$1\t$len\n&quot; if length($1) &gt; 100;
    }
}

__DATA__
&gt;TRE|Q47404|Q47404 (409 AA) Glycosyl transferase [Escherichia coli]
MIFDASLKKLRKLFVNPIGFFRDSWFFNSKNKAEELLSPLKIKSKNIFIVAHLGQLKKAE
LFIQKFSRRSNFLIVLATKKNTEMPRLILEQMNKKLFSSYKLLFIPTEPNTFSLKKVIWF
YNVYKYIVLNSKAKDAYFMSYAQHYAIFIWLFKKNNIRCSLIEEGTGTYKTEKKKPLVNI
NFYSWIINSIILFHYPDLKFENVYGTFPNLLKEKFDAKKIFEFKTIPLVKSSTRMDNLIH

&gt;seq1
TGAATTACTAGAAGTACTTAAAATGATGGTTGGAGGAAATATTCTTGATGATCAAATTGC
CGTTAAACTAGGATTTCTTATAAAGGAGGTTGGTAGTAAAATTCATGAAGATCATTAAGT

&gt;TRE|Q8VRL9|Q8VRL9 (492 AA) SiaD [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY</code>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36903/perl-script-to-find-coding-regions-in-dna-sequences</guid>
	<pubDate>Mon, 11 Jun 2018 08:03:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36903/perl-script-to-find-coding-regions-in-dna-sequences</link>
	<title><![CDATA[Perl script to find coding regions in DNA sequences]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl -w

use strict;


# if the number of input arguments is lower than 2
# return a message showing the error

if (scalar(@ARGV) &lt; 2) {
  print &quot;dnaloglkh.pl codontable DNAsequence\n&quot;;
  exit(1);
}

# create two vars contaning the filenames

my $filecodontable = $ARGV[0];
my $filesequence = $ARGV[1];


# open the first file: table of codon usage frequencies

if (!open(PROPCODONS,&quot;&lt; $filecodontable&quot;)) {
  print &quot;dnaloglkh.pl: the file $filecodontable can not be opened\n&quot;;
  exit(1);
}


# load the frequencies of codon usage into the hash table %pcodons
# this hash is indexed by a triplet of nucleotides or codon

my %pcodons;
my ($codon,$freq);
my $line;

# for each line in the file (tableofcodons) do

while ($line = ) {

	# extract the two values or columns with a regular expression
	# A group of letters and a decimal number
	
	($codon,$freq) = ($line =~ m/(\w+) (\d+\.\d+)/);
  
	# load the hash table
	$pcodons{$codon}=$freq;
}

close(PROPCODONS);

##########################################################
# open the DNA sequence (second file)

if (!open(SEQUENCE,&quot;&lt; $filesequence&quot;)) {
	print &quot;dnaloglkh.pl: the file $filesequence can not be opened\n&quot;;
	exit(1);
}


# FASTA format: 
# &gt;header containing a description of the sequence
# AGACTGACTTAC
# AGACTGACTTAC
# AGACTGAC

my $iden = ;  # reading the header of the sequence
my @seql = ;  # load the complete sequence into an array
my $seq  = &quot;&quot;;          # use a var to save the whole sequence inside

close(SEQUENCE);

# concat every segment (line) in the array into the var $seq

my $i=0;

while ($i &lt; scalar(@seql)) {
	chomp($seql[$i]);
	$seq = $seq . $seql[$i];
	$i = $i + 1;
}

$seq = &quot;\U$seq&quot;;

# we are going to use the array @codons to store all of the codons in
# the input DNA sequence. To split the sequence in segments of length
# three, a regular expression will be used searching for all of the
# possible groups of three symbols (option g)

my @codons = ($seq =~ m/.../g);

# the likelihood ratio is for a current triplet, the ratio between 
# the probability computed from the table of codon frequencies divided
# by the random probability (uniform), that is, 1/64. After this,
# the log must be applied to obtain the log-likelihood ratio.
# It is recommended the use of logs to manage calculations involving
# small numbers between 0 and 1. Log of the product is equal to the
# the sum of logs and the log of the division is the substraction of logs

my $r = 0.0;

$i=0;
while ($i &lt; scalar(@codons)) {
  $r = $r + log($pcodons{$codons[$i]}) - log(1/64);
  $i = $i + 1;
}

printf &quot;%.2f\n&quot;,$r;


__END__

To test the program, save this two files in your current working directory: exon.fa and intron.fa. Both files contain the sequence of a real exon and intron, respectively. Extremely different values of coding potential should be obtained with the two sequences: high values (positive) for coding protein sequences and low values (zero or even negative) for intronic regions. 

 exon.fa 

&gt;HUMHBB.2ex
GCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACT
CCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCT
TTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCT
GCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGG


 intron.fa.

&gt;HUMHBB.2pin
ATAACAATTGTTTTCTTTTGTTTAATTCTTGCTTTCTTTTTTTTTCTTCTCCGCAATTTTT
ACTATTATACTTAATGCCTTAACATTGTGTATAACAAAAGGAAATATCTCTGAGATACATT
AAGTAACTTAAAAAAAAACTTTACACAGTCTGCCTAGTACATTACTATTTGGAATATATGT
GTGCTTATTTGCATATTCATAATCTCCCTACTTTATTTTC</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36671/biological-sequence-handling-with-perl</guid>
	<pubDate>Wed, 16 May 2018 08:18:12 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36671/biological-sequence-handling-with-perl</link>
	<title><![CDATA[Biological Sequence handling with Perl !]]></title>
	<description><![CDATA[<code>package Sequence::Generic;
# File: Sequence/Generic.pm

use strict;
use Carp;
use overload 
  &#039;&quot;&quot;&#039;        =&gt; &#039;asString&#039;,
  &#039;neg&#039;       =&gt; &#039;reverse&#039;,
  &#039;.&#039;         =&gt; &#039;concatenate&#039;,
  &#039;fallback&#039;  =&gt; &#039;TRUE&#039;;

# These methods should be overriden by child classes
# class constructor
sub new {
    my $class = shift;
    croak &quot;$class must override the new() method&quot;;
}
# Return the sequence as a string
sub seq {
    my $self = shift;
    croak ref($self),&quot; must override the seq() method&quot;;
}
# Return the type of the sequence as a human readable string
sub type {
    return &#039;Generic Sequence&#039;;
}
# These methods probably don&#039;t have to be overridden
# The length of the sequence
sub length {
    my $self = shift;
    return length($self-&gt;seq);
}
# The reverse of the sequence
sub reverse {
    my $self = shift;
    my $reversed = reverse $self-&gt;seq;
    return $reversed;
}
# A human-readable description of the object
sub asString {
  my $self = shift;
  return $self-&gt;type . &#039;(&#039; . $self-&gt;length . &#039; residues)&#039;;
}
# Concatenate two sequences together and return the result

sub concatenate {
  my $self = shift;
  my ($new_seq,$prepend) = @_;
  my ($to_append);
  if (ref($new_seq)) {
      croak &quot;argument to concatenate must be a string or a Sequence object&quot;
      unless $new_seq-&gt;isa(__PACKAGE__);
      $to_append = $new_seq-&gt;seq ;
  } else {
      $to_append = $new_seq;
  }
  return $self-&gt;new($prepend ? $to_append . $self-&gt;seq 
                     : $self-&gt;seq . $to_append);
}
1;

Back to Article

Listing Two
 package Sequence::Nucleotide;
# file: Sequence/Nucleotide.pm

use Sequence::Generic;
use Sequence::Nucleotide::Subsequence;
use Sequence::Alignment;
use Carp;

use strict;
use vars &#039;@ISA&#039;;
:Generic&#039;;

my %CODON_TABLE = (
           UCA =&gt; &#039;S&#039;,UCG =&gt; &#039;S&#039;,UCC =&gt; &#039;S&#039;,UCU =&gt; &#039;S&#039;,
           UUU =&gt; &#039;F&#039;,UUC =&gt; &#039;F&#039;,UUA =&gt; &#039;L&#039;,UUG =&gt; &#039;L&#039;,
           UAU =&gt; &#039;Y&#039;,UAC =&gt; &#039;Y&#039;,UAA =&gt; &#039;*&#039;,UAG =&gt; &#039;*&#039;,
           UGU =&gt; &#039;C&#039;,UGC =&gt; &#039;C&#039;,UGA =&gt; &#039;*&#039;,UGG =&gt; &#039;W&#039;,
           CUA =&gt; &#039;L&#039;,CUG =&gt; &#039;L&#039;,CUC =&gt; &#039;L&#039;,CUU =&gt; &#039;L&#039;,
           CCA =&gt; &#039;P&#039;,CCG =&gt; &#039;P&#039;,CCC =&gt; &#039;P&#039;,CCU =&gt; &#039;P&#039;,
           CAU =&gt; &#039;H&#039;,CAC =&gt; &#039;H&#039;,CAA =&gt; &#039;Q&#039;,CAG =&gt; &#039;Q&#039;,
           CGA =&gt; &#039;R&#039;,CGG =&gt; &#039;R&#039;,CGC =&gt; &#039;R&#039;,CGU =&gt; &#039;R&#039;,
           AUU =&gt; &#039;I&#039;,AUC =&gt; &#039;I&#039;,AUA =&gt; &#039;I&#039;,AUG =&gt; &#039;M&#039;,
           ACA =&gt; &#039;T&#039;,ACG =&gt; &#039;T&#039;,ACC =&gt; &#039;T&#039;,ACU =&gt; &#039;T&#039;,
           AAU =&gt; &#039;N&#039;,AAC =&gt; &#039;N&#039;,AAA =&gt; &#039;K&#039;,AAG =&gt; &#039;K&#039;,
           AGU =&gt; &#039;S&#039;,AGC =&gt; &#039;S&#039;,AGA =&gt; &#039;R&#039;,AGG =&gt; &#039;R&#039;,
           GUA =&gt; &#039;V&#039;,GUG =&gt; &#039;V&#039;,GUC =&gt; &#039;V&#039;,GUU =&gt; &#039;V&#039;,
           GCA =&gt; &#039;A&#039;,GCG =&gt; &#039;A&#039;,GCC =&gt; &#039;A&#039;,GCU =&gt; &#039;A&#039;,
           GAU =&gt; &#039;D&#039;,GAC =&gt; &#039;D&#039;,GAA =&gt; &#039;E&#039;,GAG =&gt; &#039;E&#039;,
           GGA =&gt; &#039;G&#039;,GGG =&gt; &#039;G&#039;,GGC =&gt; &#039;G&#039;,GGU =&gt; &#039;G&#039;,
          );
*complement = *reversec = \&amp;reverse;

sub new {
  my $class = shift;
  $class = ref($class) if ref($class);
  my ($sequence,$type) = @_;

  my $self = bless {},$class;
  if (ref($sequence)) {
    croak &quot;Can&#039;t initialize sequence from non-Sequence object.\n&quot;
      unless $sequence-&gt;can(&#039;seq&#039;);
    %{$self} = %{$sequence};  # clone operation
  } else {
    croak &quot;Doesn&#039;t look like sequence data&quot; 
      unless $sequence=~/^[gactnu\s]+$/i;
    $self-&gt;{&#039;data&#039;} = $self-&gt;_canonicalize($sequence);
    $self-&gt;{&#039;type&#039;} = $type || ($sequence=~/u/i ? &#039;RNA&#039; : &#039;DNA&#039;);
  }
  return $self;
}
sub seq {
    my $self = shift;
    $self-&gt;{&#039;data&#039;} = $self-&gt;_canonicalize($_[0])  if defined($_[0]);
    my $seq = $self-&gt;{&#039;data&#039;};
    return $seq unless $self-&gt;is_RNA;
    $seq=~tr/T/U/;
    return $seq;
}
sub type {
    my $self = shift;
    return defined($_[0]) ? $self-&gt;{&#039;type&#039;} = $_[0] : $self-&gt;{&#039;type&#039;};
}
sub is_DNA {
    my $self = shift;
    return $self-&gt;type eq &#039;DNA&#039;;
}
sub is_RNA {
  my $self = shift;
  return $self-&gt;type eq &#039;RNA&#039;;
}
sub subseq {
  my $self = shift;
  my ($start,$end) = @_;
  return (__PACKAGE__ . &#039;::Subsequence&#039;)-&gt;new($self,$start,$end);
}
sub reverse {
  my $self = shift;
  return (__PACKAGE__ . &#039;::Subsequence&#039;)-&gt;new($self,$self-&gt;length,1);
}
sub translate {
  my $self = shift;
  my $frame = shift() || 1;
  my $l = $self-&gt;length;
  my $seq = $frame &gt; 0 ? $self-&gt;subseq($frame,$l-($l-$frame+1)%3)
              : $self-&gt;reverse-&gt;subseq(abs($frame),$l-($l-abs($frame)+1)%3);
  my $s = $seq-&gt;seq;
  $s=~tr/T/U/;  # put it in RNA mode
  $s =~ s/(\S{3})/$CODON_TABLE{$1} || &#039;X&#039;/eg;
  return $s;
}
sub longest_orf {
    my $self = shift;

    my ($max,$pos,$frame);
    foreach (-3..-1,1..3) {
    my $translation = $self-&gt;translate($_);
    while ($translation=~/([^*]+)/g) {
        if (length($1) &gt; length($max)) {
        $max = $1;
        $frame = $_;
        $pos = pos($translation) - length($max); 
        }
    }
    }
    $pos *= 3;
    $pos += abs($frame);
    return ($pos,$pos+3*length($max)-1) if $frame &gt; 0;
    return ($self-&gt;length-$pos,$self-&gt;length-$pos-3*length($max));
}
sub align {
    my $self = shift;
    my $seq = shift;
    $seq = $seq-&gt;seq if ref($seq);
    return new Sequence::Alignment(src=&gt;$seq,target=&gt;$self-&gt;seq);
}
sub _canonicalize {
  my $self = shift;
  my $seq = shift;
  $seq =~ tr/uU/tT/;
  $seq =~ s/[^gatcn]//ig;
  return uc($seq);
}
1;</code>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36600/bash-script-to-convert-sam-to-bam-visualization-ready</guid>
	<pubDate>Mon, 14 May 2018 07:31:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36600/bash-script-to-convert-sam-to-bam-visualization-ready</link>
	<title><![CDATA[Bash script to convert SAM to BAM visualization ready]]></title>
	<description><![CDATA[<code>samtools view -bS file.sam | samtools sort - file_sorted

samtools index test_sorted.bam test_sorted.bai</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36256/coverage-depth-of-reads</guid>
	<pubDate>Tue, 17 Apr 2018 14:18:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36256/coverage-depth-of-reads</link>
	<title><![CDATA[Coverage / Depth of reads !]]></title>
	<description><![CDATA[<code># get total number of bases covered at MIN_COVERAGE_DEPTH or higher
samtools mpileup mapping_result_sorted.bam | awk -v X=&quot;${MIN_COVERAGE_DEPTH}&quot; &#039;$4&gt;=X&#039; | wc -l
32876

# get length of reference genome
bowtie2-inspect -s refgenome | awk &#039;{ FS = &quot;\t&quot; } ; BEGIN{L=0}; {L=L+$3}; END{print L}&#039;
45678</code>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36254/genome-covered</guid>
	<pubDate>Tue, 17 Apr 2018 14:13:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36254/genome-covered</link>
	<title><![CDATA[Genome Covered !]]></title>
	<description><![CDATA[<code>zero=$(bedtools genomecov -ibam BAM -g hg38.fasta -bga | awk &#039;$4==0 {bpCountZero+=($3-$2)} {print bpCountZero}&#039; | tail -1)

nonzero=$(bedtools genomecov -ibam BAM -g hg38.fasta -bga | awk &#039;$4&gt;0 {bpCountNonZero+=($3-$2)} {print bpCountNonZero}&#039; | tail -1)

percent=$(bc &lt;&lt;&lt; &quot;scale=6; ($nonzero / ($zero + $nonzero))*100&quot;)

echo $percent</code>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36107/perl-script-to-read-multi-fasta-sequence-one-by-one</guid>
	<pubDate>Fri, 06 Apr 2018 09:01:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36107/perl-script-to-read-multi-fasta-sequence-one-by-one</link>
	<title><![CDATA[Perl script to read multi fasta sequence one by one]]></title>
	<description><![CDATA[<code>#!/usr/bin/env perl

use strict;
use warnings;

#USAGE
#perl rohanRun.pl seq.fa
my $outfile=&#039;tmp.fa&#039;;
my $fastaSeq_ref = readfasta (&quot;$ARGV[0]&quot;);
my %fastaSeq = %$fastaSeq_ref;
foreach my $key ( keys %fastaSeq) {
open (OUT, &quot;&gt;$outfile&quot;) or die &quot;couldn&#039;t open the file $outfile $!&quot;;
print OUT &quot;$key\n$fastaSeq{$key}\n&quot;;

}


sub readfasta 
{
  (my $file)=@_;
	my %sequence;
	my $header;
	my $temp_seq;
	
	#suppose fasta files contains multiple sequences;
	 
	open (IN, &quot;&lt;$file&quot;) or die &quot;couldn&#039;t open the file $file $!&quot;;
	
	while (&lt;IN&gt;)
	{	
		chop;
		next if /^\s*$/; #skip empty line 
		if ($_ =~ /^&gt;/)  #when see head line
		{	
			$header= $_;
			if ($sequence{$header}){print colored(&quot;#CAUTION: SAME FASTA HAS BEEN READ MULTIPLE TIMES.\n#CAUTION: PLEASE CHECK FASTA SEQUENCE:$header\n&quot;,&quot;red&quot;)};
			if ($temp_seq) {$temp_seq=&quot;&quot;} # If there is alreay sequence in temp_seq, empty the sequence file
			
		}
		else # when see the sequence line 
		{
		   s/\s+//g;
		   $temp_seq .= $_;
		   $sequence{$header}=$temp_seq; #update the contents
		}
	
	}
	
	return \%sequence;
}</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/36006/perl-script-to-count-the-number-of-files-in-a-directory-with-regex</guid>
	<pubDate>Wed, 21 Mar 2018 05:13:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/36006/perl-script-to-count-the-number-of-files-in-a-directory-with-regex</link>
	<title><![CDATA[Perl script to count the number of files in a directory with regex]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl
use strict;
use warnings;
my @allNames=(&quot;_D14_&quot;,&quot;_B14_&quot;,&quot;_B15_&quot;,&quot;_C1T1_&quot;,&quot;_C3T3_&quot;,&quot;_D12_&quot;,&quot;_D13_&quot;,&quot;_E1B1_&quot;,&quot;_E1B3_&quot;,&quot;_H001_&quot;,&quot;_H3-03_&quot;,&quot;_H3-04_&quot;,&quot;_HB0_&quot;,&quot;_C28_&quot;,&quot;_C33_&quot;,&quot;_C31_&quot;,&quot;_C1T2_&quot;,&quot;_D11_&quot;,&quot;_E11_&quot;,&quot;_E31_&quot;,&quot;_B11_&quot;,&quot;_B33_&quot;,&quot;_B3B1_&quot;,&quot;_C210_&quot;,&quot;_C211_&quot;,&quot;_C21_&quot;,&quot;_C24_&quot;,&quot;_C27_&quot;,&quot;_E2B2_&quot;,&quot;_H004_&quot;,&quot;_H4-28_&quot;,&quot;_Hprim34_&quot;,&quot;_Hprim53_&quot;,&quot;_HPRIM1_&quot;,&quot;_HPRIM21_&quot;,&quot;_HPRIM22_&quot;,&quot;_HPRIM36_&quot;,&quot;_A110_&quot;,&quot;_A12_&quot;,&quot;_A11_&quot;,&quot;_A111_&quot;,&quot;_A112_&quot;,&quot;_A13_&quot;,&quot;_A14_&quot;,&quot;_A15_&quot;,&quot;_A18_&quot;,&quot;_A19_&quot;,&quot;_C29_&quot;,&quot;_Hprim18_&quot;,&quot;_D22_&quot;,&quot;_D23_&quot;,&quot;_D21_&quot;,&quot;_A16_&quot;,&quot;_A17_&quot;,&quot;_B24_&quot;,&quot;_A3B1_&quot;,&quot;_C2B4_&quot;,&quot;_B22_&quot;,&quot;_H3-14_&quot;,&quot;_Hprim54_&quot;,&quot;_Hprim12_&quot;,&quot;_H4-02_&quot;,&quot;_H4-04_&quot;,&quot;_HPRIM15_&quot;,&quot;_HPRIM19_&quot;,&quot;_B39_&quot;,&quot;_C3B1_&quot;,&quot;_C3T1_&quot;,&quot;_HPRIM16_&quot;,&quot;_A21_&quot;,&quot;_A22_&quot;,&quot;_A32_&quot;,&quot;_A33_&quot;,&quot;_A24_&quot;,&quot;_A25_&quot;,&quot;_A23_&quot;,&quot;_B34_&quot;,&quot;_B38_&quot;,&quot;_E3B2_&quot;,&quot;_E3T1_&quot;,&quot;_HPRIM14_&quot;,&quot;_H158_&quot;);

my @files = glob(&quot;*.scf *.SCF&quot;);
foreach my $name (@allNames) {
my $nName=lc ($name);
my $cnt=0;
foreach my $file (@files) {
#print &quot;$file =~ /$nName/\n&quot;;
	my $nFile =lc $file;
	$cnt++ if ($nFile =~ /$nName/);
        #print &quot;$file\n&quot;;
    }
print &quot;$cnt\n&quot;;
if ($cnt &gt; 2 or $cnt &lt; 2) { print &quot;$nName\n&quot;;}
}</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>