<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: All]]></title>
	<link>https://bioinformaticsonline.com/snippets?offset=270</link>
	<atom:link href="https://bioinformaticsonline.com/snippets?offset=270" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/40571/perl-subroutine-to-creating-kmer</guid>
	<pubDate>Mon, 20 Jan 2020 04:32:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/40571/perl-subroutine-to-creating-kmer</link>
	<title><![CDATA[Perl subroutine to creating kmer !]]></title>
	<description><![CDATA[<code>sub k_mers {
	my ($sequence, $k) = @_;
	my $len = length($sequence);
	my @result = ();
	for (my $i = 0; $i &lt;= $len-$k; $i++) {
		push(@result, substr($sequence, $i, $k));
	}
	return @result;
}</code>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/40391/samtools-commands-for-bioinformatician</guid>
	<pubDate>Sun, 15 Dec 2019 10:31:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/40391/samtools-commands-for-bioinformatician</link>
	<title><![CDATA[Samtools commands for bioinformatician !]]></title>
	<description><![CDATA[<code>## count mapped reads
samtools view -c -F 260 mapping_file.bam


### converting sam file into fasta
samtools fasta reads_mapped.sam &gt; reads.fasta

### converting sam file into bam
# -b : output is bam
# -S : input is sam
# -o : output file name
samtools view -b -S -o sal_sej.bam sal_sej.sam

### viewing bam files (view command)
samtool view sal_sej.bam | less


### sort reads by flag specified and show them
# -f INT show flag matches
samtools -f 4 sal_sej.bam | less

# -F INT show reads excepting flag matches
samtools -F 4 sal_sej.bam | less


### count reads (-c) by flag specified
samtool view -c -f 4 sal_sej.bam


### count reads by quality value specified (-q) (&gt;=)
# -q : minimal quality value
samtools view -q 42 -c sal_sej.bam



### sorting bam file by genome position
samtools sort sal_sej.bam &gt; sal_sej_sorted.bam

### indexing sorted file
# ouput is file.bai
# always index sorted files
samtools index sal_sej_sorted.bam.bam

### identifying genome variants (mpileup command)
# -g : output is bcf (binary call format) file
# -f : use reference genome given
samtools mpileup -g -f sal_ref_sej.fasta sal_sej_sorted.bam.bam &gt; sal_vars.bcf


### calling snp and indels
# -c : find snp
# -v : output only potential variants
bcftools view -c -v sal_vars.bcf &gt; sal_vars.vcf
### note: new command is &quot;call&quot;. type &quot;bcftools&quot; to see current version and commands

### calling snp and indels with no frequency threshold
# omit -v parameter
bcftools call -c sal_vars.bcf &gt; sal_vars.vcf

### normalize (realign) indels
# -f : reference fasta, needed to left align and normalize
bcftools norm -f ./input_data/ref.fasta \
vars.vcf \
-o vars_indels_realigned.vcf

### show alignment
# first arg is sorted bam file
# second arg is reference 
samtools tview sal_sej_sorted.bam.bam sal_ref_sej.fasta

### results into txt file
samtools depth /path/to/sorted_bam.bam &gt; /path/to/coverage_results.txt
### note: returns count of depth at each position. input should be sorted

### count depth at each position and put it into a txt file
# -a : at all positions
samtools depth -a sorted_dupremoved.bam &gt; depth.txt

### one liner to count mean depth
samtools depth -a sorted_dupremoved.bam | awk &#039;{c++;s+=$3}END{print s/c}&#039;

### one liner to count coverage breadth
samtools depth -a sorted_dupremoved.bam | awk &#039;{c++; if($3&gt;0) total+=1}END{print (total/c)*100}&#039;

### flagstat
# output file has two columns: QC-passed reads and QC-failed reads
# and rows: total reads, duplicates, mapped.
samtools flagstat sorted_dupremoved.bam &gt; flagstat.txt

####################################
# try these recipies: 
### convert BAM to FASTA:
samtools view filename.bam | awk &#039;{OFS=&quot;\t&quot;; print &quot;&gt;&quot;$1&quot;\n&quot;$10}&#039; - &gt; filename.fasta

## converting BAM to SAM:
samtools view -h -o out.sam in.bam
## &quot;samtools view -h &lt;in.bam&gt; &gt; &lt;out.sam&gt;&quot;</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/40386/qv-calculation-in-bash</guid>
	<pubDate>Fri, 13 Dec 2019 21:14:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/40386/qv-calculation-in-bash</link>
	<title><![CDATA[QV calculation in Bash !]]></title>
	<description><![CDATA[<code># $1 = vcf file
# $2 = input bam file
# $3 = output QV file

module load samtools

NUM_BP=`samtools depth $2 | perl -e &#039;$c = 0; while(&lt;&gt;){chomp; @s = split(/\t/); if(scalar(@s) &gt;= 3){$c++;}} print &quot;$c\n&quot;;&#039;`
echo &quot;num bp: &quot;$NUM_BP

NUM_SNP=`cat $1 |grep -v &quot;#&quot; | awk -F &quot;\t&quot; &#039;{if (!match($NF, &quot;0/1&quot;)) print $1&quot;\t&quot;$2&quot;\t&quot;$3&quot;\t&quot;$4&quot;\t&quot;$5&quot;\t&quot;$8}&#039; | tr &#039;;&#039; &#039; &#039; | sed s/AB=//g | awk -v WEIGHT=0 &#039;{if ($6 &gt;= WEIGHT) print $0}&#039; | awk -v SUM=0 &#039;{if (length($4) == length($5)) { SUM+=length($4); } else if (length($4) &lt; length($5)) { SUM+=length($5)-length($4); } else { SUM+=length($4)-length($5)}} END { print SUM}&#039;`
echo &quot;num snp: &quot;$NUM_SNP

perl -e &#039;chomp(@ARGV); $ns = $ARGV[0]; $nb = $ARGV[1]; print (-10 * log($ns/$nb)/log(10)); print &quot;\n&quot;;&#039; $NUM_SNP $NUM_BP &gt; $3
cat $3</code>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/40357/find-and-replace-in-multifasta-or-fasta-header-with-perl-onliner</guid>
	<pubDate>Mon, 02 Dec 2019 20:45:42 -0600</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/40357/find-and-replace-in-multifasta-or-fasta-header-with-perl-onliner</link>
	<title><![CDATA[Find and replace in multifasta or fasta header with perl onliner]]></title>
	<description><![CDATA[<code>You have a fasta file and you want to replace: 
&quot;|&quot;

You are told to replace that by   
&quot;_&quot;

perl -i -p -e &quot;s/\|/_/g&quot;  genome.fasta

 
-i = inplace editing
-p = loop over lines and print each line (after processing)
-e = command line script</code>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/39962/perl-script-to-run-in-parellel</guid>
	<pubDate>Sun, 22 Sep 2019 22:08:20 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/39962/perl-script-to-run-in-parellel</link>
	<title><![CDATA[Perl script to run in parellel !]]></title>
	<description><![CDATA[<code>#!/usr/bin/perl

use strict;
use warnings;
use Parallel::ForkManager;
use Bio::SeqIO;

my ($sequence_data_ref) = parse_genome_files($ARGV[0]);
my %genome=%{$sequence_data_ref};

my $n_processes = 4;
my $pm = Parallel::ForkManager-&gt;new( $n_processes );
for my $i ( 1 .. $n_processes ) {
    $pm-&gt;start and next;

    my $count = 0;
    foreach my $chr_set (keys %genome) {         
        $count++;
        if ( ( $count % $i ) == 0 ) {
            if ( !output_exists($genome{$chr_set}{name}) ) {
                start_new_XFOIL_instance($genome{$chr_set}{name}, $genome{$chr_set}{nuc_seq});
            }
        }
    }

    $pm-&gt;finish;
}
$pm-&gt;wait_all_children;

sub output_exists {
    my $chr_set = shift;
    return ( -f &quot;$chr_set.out&quot; );
}

sub start_new_XFOIL_instance {
    my ($chr_set, $chr_seq) = @_;
    print &quot;starting XFOIL instance with parameters $chr_set!\n&quot;;
    touch( &quot;$chr_set.out&quot;, $chr_seq );
    print &quot;finished run with parameters $chr_set!\n&quot;;
}

sub touch {
    my ($fn, $seq) = @_;
    open FILE, &quot;&gt;$fn&quot; or die $!;
    system (&quot;augustus --species=caenorhabditis --outfile=$fn $seq --AUGUSTUS_CONFIG_PATH=/home/urbe/Tools/Alienomics_v1.1/augustus.2.5.5/config&quot;);
    close FILE or die $!;
}

sub parse_genome_files {
    my $file=shift;
    my (%sequence_data);
    my $file_content = new Bio::SeqIO(-format =&gt; &#039;fasta&#039;,-file =&gt; &quot;$file&quot;);
    my $out_content = Bio::SeqIO-&gt;newFh(-format =&gt; &#039;fasta&#039;, ,-file =&gt; &quot;&gt;genomeRES.fa&quot;);
    while (my $gene_info = $file_content-&gt;next_seq()) {
      my $sequence = $gene_info-&gt;seq();
      my $accession_number = $gene_info-&gt;display_id; 
      my $len = $gene_info-&gt;length;
      my $GCcount = $sequence =~ tr/GC|gc//;
      my $GCcontent = ($GCcount / $len) * 100;
      $sequence_data{$accession_number}{status} = &quot;OK&quot;; #everybody starts fine
      $sequence_data{$accession_number}{problem_desc} = &quot;-&quot;; #everybody starts fine
      if ($sequence_data{$accession_number}{status} eq &quot;OK&quot;) { # Add check points here &lt;&lt;&lt;&lt;&lt;&lt;
        $sequence_data{$accession_number}{nuc_seq} = $sequence;
	$sequence_data{$accession_number}{len} = $len;
	$sequence_data{$accession_number}{gc} = $GCcontent;
	$sequence_data{$accession_number}{name} = $accession_number;
	print $out_content $gene_info;
      }
    }
  return (\%sequence_data);
}</code>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/39853/run-sspace</guid>
	<pubDate>Sat, 17 Aug 2019 15:06:38 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/39853/run-sspace</link>
	<title><![CDATA[Run sspace !]]></title>
	<description><![CDATA[<code>#!/bin/bash

cd `pwd`

perl ~/apps/SSPACE-1.2_linux-x86_64/SSPACE_v1-2.pl \
-l libraries.txt \
-s Contigs_over200_nocp.fasta \
-k 5 \
-a 0.7 \
-x 1 \
-m 30 \
-o 20 \
-b Rayk31_scaffolds_extension</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/39849/map-the-long-reads</guid>
	<pubDate>Thu, 15 Aug 2019 01:05:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/39849/map-the-long-reads</link>
	<title><![CDATA[Map the long reads]]></title>
	<description><![CDATA[<code>Map them agaist reference avaga genome using following codes 
git clone https://github.com/lh3/bwa.git
cd bwa; make
bwa index ref.fa
bwa mem -x pacbio ref.fa pacbio.fq &gt; aln.sam
bwa mem -x ont2d ref.fa ont-2D.fq &gt; aln.sam</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/39847/convert-fastq-to-fastq</guid>
	<pubDate>Thu, 15 Aug 2019 00:33:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/39847/convert-fastq-to-fastq</link>
	<title><![CDATA[Convert FASTQ to FASTQ]]></title>
	<description><![CDATA[<code># Convert FASTQ to FASTA
seqtk seq -a IN.fastq &gt; OUT.fasta

# Convert FASTQ to FASTA and set bases of quality lower than 20 to N
seqtk seq -aQ64 -q20 -n N IN.fastq &gt; OUT.fasta


# Download Seqtk
https://github.com/lh3/seqtk</code>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/39763/resume-the-mira-assembler-run</guid>
	<pubDate>Mon, 05 Aug 2019 22:55:27 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/39763/resume-the-mira-assembler-run</link>
	<title><![CDATA[Resume the MIRA assembler run !]]></title>
	<description><![CDATA[<code>mira -r manifest_file

Usage:
mira [options] manifest_file [manifest_file ...]

Options:
  -c / --cwd=           directory       Change working directory
  -r / --resume                         Resume an interupted assembly
  -h / --help                           Print short help and exit
  -v / --version                        Print version and exit</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/snippets/view/39755/palindrome-simulation-commands</guid>
	<pubDate>Sun, 04 Aug 2019 19:24:42 -0500</pubDate>
	<link>https://bioinformaticsonline.com/snippets/view/39755/palindrome-simulation-commands</link>
	<title><![CDATA[Palindrome Simulation commands !]]></title>
	<description><![CDATA[<code>(base) ➜  palindromeAssemblySim more allCommands 
 3315  mutate.sh in= mutant15CH101.fasta out=mutant153CH101.fasta id=97
 3316  mutate.sh in=mutant15CH101.fasta out=mutant153CH101.fasta id=97
 3317  history &gt; allCommands
 3318  cat ch101_read33155_template_pass_FAH31515.faa mutant3CH101.fasta mutant15CH101.fasta mutant153CH101.fasta &gt; allPalindromeSimulated.fa
 3319  ~/Tools/art_bin_MountRainier/art_illumina -ss MSv3 -sam -i allPalindromeSimulated.fa -p -l 251 -f 100 -m 300 -s 10 -o paired_dat
 3320  ~/Tools/art_bin_MountRainier/art_illumina -ss MSv3 -sam -i allPalindromeSimulated.fa -p -l 250 -f 100 -m 300 -s 10 -o paired_dat
 3321  ~/Tools/seqtk/seqtk mergepe paired_dat1.fq paired_dat2.fq &gt; interleavedPE.fq
 3322  ~/Tools/seqtk/seqtk seq -a interleavedPE.fq &gt; interleavedPE.fasta
 3323  ~/Tools/BWISE/Bwise.py -c 0 -p 4 -K 250 -t 10 -o simPalBwise -x interleavedPE.fasta
 3326  cd masurca; ~/Tools/MaSuRCA-3.2.3/bin/masurca sim.conf
 3327  ./assemble.sh
 3328  cd ..
 3329  ~/Tools/SPAdes-3.10.1-Linux/bin/spades.py --careful -1 paired_dat1.fq -2 paired_dat2.fq -o SpadesAssembly_OUT
 3330  ~/Tools/platanus/platanus
 3331  ~/Tools/platanus/platanus assemble 
 3332  ~/Tools/platanus/platanus assemble -o seePlatanus -f paired_dat[12].fq -t 10</code>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>