<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36603?offset=60</link>
	<atom:link href="https://bioinformaticsonline.com/related/36603?offset=60" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/22388/perl-one-liner-basics</guid>
	<pubDate>Sun, 24 May 2015 09:28:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/22388/perl-one-liner-basics</link>
	<title><![CDATA[Perl One liner basics !!]]></title>
	<description><![CDATA[<p>Perl has a ton of command line switches (see perldoc perlrun), but I'm just going to cover the ones you'll commonly need to debug code. The most important switch is -e, for execute (or maybe "engage" :) ). The -e switch takes a quoted string of Perl code and executes it. For example:<br /><br />$ perl -e 'print "Hello, World!\n"'<br />Hello, World!<br /><br />It's important that you use single-quotes to quote the code for -e. This usually means you can't use single-quotes within the one liner code. If you're using Windows cmd.exe or PowerShell, you must use double-quotes instead.<br /><br />I'm always forgetting what Perl's predefined special variables do, and often test them at the command line with a one liner to see what they contain. For instance do you remember what $^O is?<br /><br />$ perl -e 'print "$^O\n"'<br />linux<br /><br />It's the operating system name. With that cleared up, let's see what else we can do. If you're using a relatively new Perl (5.10.0 or higher) you can use the -E switch instead of -e. This turns on some of Perl's newer features, like say, which prints a string and appends a newline to it. This saves typing and makes the code cleaner:<br /><br />$ perl -E 'say "$^O"'<br />linux<br /><br />Pretty handy! say is a nifty feature that you'll use again and again.</p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/22570/frequent-words-problem-solution-by-perl</guid>
	<pubDate>Tue, 09 Jun 2015 23:38:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/22570/frequent-words-problem-solution-by-perl</link>
	<title><![CDATA[Frequent words problem solution by Perl]]></title>
	<description><![CDATA[<div><p>Solved with perl <a href="http://rosalind.info/problems/1a/">http://rosalind.info/problems/1a/</a></p><p>#Find the most frequent k-mers in a string.<br />#Given: A DNA string Text and an integer k.<br />#Return: All most frequent k-mers in Text (in any order).<br /><br />use strict;<br />use warnings;<br /><br />my $string="ACGTTGCATGTCGCATGATGCATGAGAGCT";<br />my $kmer=4; <br />my %myHash;<br />my $max=0;<br /><br />for (my $aa=0; $aa&lt;=(length($string)-4); $aa++) {<br />&nbsp;&nbsp; &nbsp;my $myStr=substr&nbsp; $string, $aa,$kmer;<br />&nbsp;&nbsp; &nbsp;#print "$myStr\n";<br />&nbsp;&nbsp; &nbsp;my $km=kmerMatch ($string, $myStr, $kmer);<br />&nbsp;&nbsp; &nbsp;if ($km &gt; $max) { $max = $km;}<br />&nbsp;&nbsp; &nbsp;#print "$km\t$myStr\n";<br />&nbsp;&nbsp; &nbsp;$myHash{$myStr}=$km;<br />&nbsp;&nbsp; &nbsp;<br />}<br /><br />#Print all key which have matching values<br />foreach my $name (keys %myHash){<br />&nbsp;&nbsp;&nbsp; print "$name " if $myHash{$name} == $max;<br />}<br /><br />sub kmerMatch { #Check the exact matching kmers with sliding window<br />my ($string, $myStr, $kmer)=@_;<br />my $count=0;<br />for (my $aa=0; $aa&lt;=(length($string)-4); $aa++) {<br />&nbsp;&nbsp; &nbsp;my $myWin=substr&nbsp; $string, $aa,$kmer;<br />&nbsp;&nbsp; &nbsp;if ($myWin eq $myStr) {<br />&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;#print "$myWin eq $myStr\n";<br />&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;$count++;<br />&nbsp;&nbsp; &nbsp;}<br />}<br />return $count;<br />}</p></div>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/22572/clump-finding-problem-solved-with-perl</guid>
	<pubDate>Wed, 10 Jun 2015 00:17:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/22572/clump-finding-problem-solved-with-perl</link>
	<title><![CDATA[Clump Finding Problem Solved with Perl]]></title>
	<description><![CDATA[<p>The question at http://rosalind.info/problems/1d/</p><p>Script are moved to&nbsp;http://bioinformaticsonline.com/snippets/view/34633/clump-finding-problem-solved-with-perl</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30336/finding-patterns-in-biological-sequences</guid>
	<pubDate>Thu, 22 Dec 2016 10:30:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30336/finding-patterns-in-biological-sequences</link>
	<title><![CDATA[Finding Patterns in Biological Sequences]]></title>
	<description><![CDATA[<p>In this report we provide an overview of known techniques for discovery of patterns of biological sequences (DNA and proteins). We also provide biological motivation, and methods of biological verification of such patterns. Finally we list publicly available tools and databases for pattern discovery. On-line supplement is available through http://genetics.uwaterloo.ca/&sim;tvinar/cs798g/motif.</p><p>Address of the bookmark: <a href="http://engr.case.edu/li_jing/papers/00798gpattern.pdf" rel="nofollow">http://engr.case.edu/li_jing/papers/00798gpattern.pdf</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/view/19838</guid>
	<pubDate>Sat, 27 Dec 2014 13:30:15 -0600</pubDate>
	<link>https://bioinformaticsonline.com/view/19838</link>
	<title><![CDATA[Interview with a bioinformatician series ...]]></title>
	<description><![CDATA[<p>The aim of this series to interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.<br /><br />This series will be available at BOL every fortnight.<br /><br /><br /><br /></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/871/postdoctoral-position-in-bioinformatics-sweden</guid>
  <pubDate>Sun, 14 Jul 2013 13:49:57 -0500</pubDate>
  <link></link>
  <title><![CDATA[Postdoctoral position in bioinformatics @ Sweden]]></title>
  <description><![CDATA[
<p>Information about the department<br />The Department of Mathematical Sciences at Chalmers University of Technology and the University of Gothenburg has about 170 faculty and staff and is the largest department of mathematical sciences in the Nordic countries. The department belongs to both Chalmers University of Technology and the University of Gothenburg (for more information see http://www.chalmers.se/math/).</p>

<p>Job description<br />We are looking for a motivated, self-driven post-doctoral researcher to work with large-scale sequence data analysis. The position is for 24 months and located at Mathematical Statistics, Department of Mathematical Sciences in Erik Kristiansson’s research group. We are focused on methods development for and analysis of next generation DNA sequencing, in particular comparative metagenomics and gene expression analysis (RNA-seq). We have strong interdisciplinary profile and are actively collaborating with several experimental groups, especially within the environmental sciences, ecology, infectious diseases and cancer genomics. More information is available at http://bioinformatics.math.chalmers.se.</p>

<p>The Post-doctoral position is an appointment that offers an opportunity to qualify for further research positions within academia or industry. The majority of your working time is devoted to your own research, normally as a member of a research group. Included in your work is also to take part in supervision of Ph.D. students and M.Sc thesis students. Teaching of undergraduate students may also be included to a small extent.</p>

<p>The employment is limited to a maximum of 2 years (1+1).</p>

<p>Qualifications<br />The applicant should have Ph.D. degree preferably in bioinformatics, mathematics, statistics, computer science or equivalent by the start of the appointment. Experience from analysis of large-scale data, in particular from next generation DNA sequencing, is highly valued. The applicant should also be proficient in programming (e.g. Python/Java/C) and comfortable with Unix/Linux systems. Interaction with experimental biologists is central and good collaborative skills are therefore important. Fluency in written and spoken English is a strong requirement. As a post-doctoral researcher you are expected to work independently and to be able to supervise/co-supervise PhD and Master’s students.</p>

<p>Application procedure<br />The application should be marked with Ref 20130126 and written in English. The application should be sent electronically via Chalmers webpage.</p>

<p>Application deadline: September 8, 2013.</p>

<p>For questions, please contact: <br />Ass prof. Erik Kristiansson, Matematiska Vetenskaper, erik.kristiansson@chalmers.se, +46 31-772 3521, +46 70-5259751.</p>

<p>Chalmers continuously strive to be an attractive employer. Equality and diversity are substantial foundations in all activities at Chalmers.</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/view/2021</guid>
	<pubDate>Mon, 12 Aug 2013 09:27:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/view/2021</link>
	<title><![CDATA[What are the difference between BioRuby and BioGem?]]></title>
	<description><![CDATA[<p>I came across two diferent but matching term BioRuby and BioGem. What are the difference between these two term? If both are using same Ruby language for development then why did they develope two different biological packages.</p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42672/introduction-to-bioinformatics-and-computational-biology</guid>
	<pubDate>Mon, 25 Jan 2021 01:32:30 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42672/introduction-to-bioinformatics-and-computational-biology</link>
	<title><![CDATA[Introduction to Bioinformatics and Computational Biology]]></title>
	<description><![CDATA[<p><span>This is the course material for STAT115/215 BIO/BST282 at Harvard University.</span></p>
<p>Xiaole Shirley Liu (lead instructor)<br>Joshua Starmer<br>Martin Hemberg<br>Ting Wang<br>Feng Yue</p>
<p>Ming Tang<br>Yang Liu<br>Jack Kang<br>Scarlett Ge<br>Jiazhen Rong<br>Phillip Nicol<br>Maartin De Vries</p>
<p>We thank many colleagues in the community, who helped Dr.&nbsp;Liu in prepare the STAT115/215 BIO/BST282 course over the years.&nbsp;</p><p>Address of the bookmark: <a href="https://liulab-dfci.github.io/bioinfo-combio/" rel="nofollow">https://liulab-dfci.github.io/bioinfo-combio/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/42815/bioinformatics-in-africa-part7-tunisia</guid>
	<pubDate>Sat, 06 Feb 2021 21:25:09 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/42815/bioinformatics-in-africa-part7-tunisia</link>
	<title><![CDATA[Bioinformatics in Africa: Part7 - Tunisia]]></title>
	<description><![CDATA[<p>Institut Pasteur de Tunis (IPT):<br />The IPT is a research institution founded in 1883. IPT is under the supervision of the Ministry of &nbsp;Health and is part of the Universit&eacute; El Manar of Tunis (Ministry of high Education). The missions &nbsp;of the institute are: Public Health Laboratory activities (PHL), Research on infectious diseases, and &nbsp;R/D on vaccines. Research programs are mainly oriented towards local health problems such as &nbsp;leishmaniais, viral hepatitis, and scorpion venoms. The &nbsp; group &nbsp; of &nbsp; Bioinformatics &nbsp; and &nbsp; Modelling &nbsp; of &nbsp; the &nbsp; IPT &nbsp; is &nbsp; hosted &nbsp; by &nbsp; the &nbsp;Laboratoire &nbsp;d&rsquo;Immunopathologie Vaccinologie et G&eacute;n&eacute;tique Mol&eacute;culaire &nbsp;(LIVGM), and exists since the &nbsp;beginning of 2005. Its present research activities include: genome annotation, EST clustering and &nbsp;modelling of the host/parasite response to Leishmania infection. It consists of two senior scientists, &nbsp;two PhD students and one MSc student</p><p>Centre&nbsp;de&nbsp;Biotechnology&nbsp;de&nbsp;Sfax&nbsp;(CBS):<br />Bioinformatics&nbsp;activity&nbsp;started&nbsp;at&nbsp;CBS&nbsp;in&nbsp;2001&nbsp;with&nbsp;the&nbsp;setting&shy;up&nbsp;of&nbsp;a&nbsp;research&nbsp;and&nbsp;service&nbsp;unit&nbsp;of&nbsp; bioinformatics.&nbsp;This&nbsp;unit&nbsp;currently&nbsp;includes&nbsp;one&nbsp;senior&nbsp;researcher,&nbsp;one&nbsp;engineer&nbsp;and&nbsp;four&nbsp;Phd&nbsp; students.&nbsp;Activities&nbsp;include&nbsp;sequence&nbsp;annotation&nbsp;(service)&nbsp;and&nbsp;three&nbsp;research&nbsp;programs:&nbsp;ab&nbsp;initio&nbsp; prediction&nbsp;of&nbsp;short&nbsp;eukaryote&nbsp;genes,&nbsp;statistical&nbsp;modelling&nbsp;by&nbsp;Bayesian&nbsp;networks&nbsp;approach&nbsp;of&nbsp;signal&nbsp; transduction&nbsp;pathways&nbsp;and&nbsp;statistical&nbsp;analysis&nbsp;of&nbsp;human&nbsp;sequence&nbsp;variation&nbsp;data&nbsp;(haplotype&nbsp; reconstruction&nbsp;and&nbsp;linkage&nbsp;disequilibrium).&nbsp;Activities&nbsp;of&nbsp;the&nbsp;Bioinformatics&nbsp;unit&nbsp;could&nbsp;be&nbsp;found&nbsp;at&nbsp; the&nbsp;website:&nbsp;http://www.cbs.rnrt.tn/&nbsp;and&nbsp;the&nbsp;research&nbsp;activity&nbsp;report&nbsp;is&nbsp;available&nbsp;under&nbsp;request&nbsp;to&nbsp; Bioinformatics@cbs.rnrt.tn.&nbsp;Although&nbsp;the&nbsp;computing&nbsp;facilities&nbsp;are&nbsp;good,&nbsp;there&nbsp;is&nbsp;still&nbsp;a&nbsp;need&nbsp;for&nbsp; trained&nbsp;human&nbsp;resources&nbsp;to&nbsp;strengthen&nbsp;bioinformatics&nbsp;capacities&nbsp;at&nbsp;CBS,&nbsp;particularly&nbsp;in&nbsp;structural&nbsp; bioinformatics.</p><p>Web site and links: http://www.cbs.rnrt.tn</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>

</channel>
</rss>