<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44002?offset=1120</link>
	<atom:link href="https://bioinformaticsonline.com/related/44002?offset=1120" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	
<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/33794/senior-bioinformatics-software-developer-hyderabad-telangana</guid>
  <pubDate>Mon, 03 Jul 2017 10:10:31 -0500</pubDate>
  <link></link>
  <title><![CDATA[Senior Bioinformatics Software Developer, Hyderabad, Telangana]]></title>
  <description><![CDATA[
<p>DuPont Pioneer is the world leader in plant biotechnology area including discovery, development and delivery of elite crop genetics. DuPont Pioneer is aggressively building Big Data and Predictive Analytics capabilities in order to deliver improved services to our customers. We are currently seeking Senior Bioinformatics Software Developer at the DuPont Knowledge Center in Hyderabad, India for our global Data Science and Informatics group. At DuPont Pioneer, you’ll become part of a work environment that nurtures your interests, ignites your passion, creates opportunities to serve and helps you attain success–both personally and professionally. The hiring level will be commensurate with the level of experience. This is a critical position with the potential to make immediate, significant impact on our business.<br />The successful candidate will have an extensive background in computer science and bioinformatics through courses or academic degrees, and proven experience in bioinformatics software development. We are looking for those creative, smart, model driven, agile individuals who enjoy giving their all to tackle diverse software needs.<br />Duties / Responsibilities</p>

<p>Job Qualifications<br />Education and Experience<br />•	Master Degree in Bioinformatics, Computational biology, Scientific Computing or related field <br />•	3-5 years of Post-Master’s experience in Bioinformatics software development <br />•	Proven experience developing high throughput bioinformatics applications<br />Required Competencies<br />•	Strong proven experience in Python programming language in Linux environment<br />•	Proven High Performance computing experience (LSF/SGE/OGE)<br />•	Exposure in code versioning and repository management (GIT/SVN)<br />•	Proven experience in Bioinformatics algorithm development<br />•	Deep understanding in Bioinformatics tools, data types<br />Desired Competencies<br />•	Familiarity working in a scientific computing environment (NumPy, SciPy, Pandas etc.)<br />•	Familiarity working with Cloud technologies (AWS, Azure)<br />•	Ability to demonstrate solid analytical skills and exceptional attention to detail.<br />•	Experience in relational databases and data structures<br />•	Proven experience working with teams using agile software development methodologies and processes<br />•	Familiarity with Service Oriented Architecture (SOA)<br />•	Familiarity with build tools (Jenkins, make, ANT, Maven)<br />•	Exposure to project management tools (JIRA, Confluence, RED MINE, etc.)</p>

<p>More at http://careers.dupont.com/jobsearch/job-details/senior-bioinformatics-software-developer/012939W-01/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/1886/interpretomics</guid>
	<pubDate>Sun, 11 Aug 2013 10:24:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/1886/interpretomics</link>
	<title><![CDATA[InterpretOmics]]></title>
	<description><![CDATA[<p>InterpretOmics, a big data analytics startup that focuses on life sciences, has received angel funding of around Rs 10 crore from a group of investors including Singapore's information technology and shipping company, Amarante.</p><p>http://www.interpretomics.co/</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/34375/the-10th-north-east-bioinformatics-network-nebinet-annual-coordinators-meet</guid>
	<pubDate>Sat, 18 Nov 2017 15:02:44 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/34375/the-10th-north-east-bioinformatics-network-nebinet-annual-coordinators-meet</link>
	<title><![CDATA[The 10th North East Bioinformatics Network (NEBINet) Annual Coordinators' Meet]]></title>
	<description><![CDATA[<p>The 10th North East Bioinformatics Network (NEBINet) Annual Coordinators' Meet organised by the Bioinformatics Centre, St Edmund's College, Shillong and sponsored by the Department of Biotechnology, Government of India, was held at St Edmund's College Auditorium here on Thursday. Meghalaya Governor Ganga Prasad graced the inaugural programme as chief guest. <br />In his inaugural address, the Governor said the panorama of scientific scenario has greatly changed over the years, the thrust areas have undergone a metamorphosis but the conceptual underpinning of the basic sciences still continues. <br />"Of late, the activity of basic research has been intricately intertwined with technology. And we are determined to carry forward this change, for it is through technology that science can actually reach the masses in our country and afar, and the changing times have also inculcated a culture of cross-departmental and interdisciplinary research. Science and technology has always played a pivotal role in taking a nation towards greater heights by ways of innovations and inventions," he added. <br />Prasad also hoped that discussions, suggestions and sharing of innovative ideas during the two-day 10th NEBINet Annual Coordinators' Meet will open up new avenues to make substantial advancement in Biological Sciences which will provide a platform for proper and effective delivery mechanism for the common man. <br />During the inaugural function, Advisor of Department of Biotechnology Dr T Madhan Mohan gave an overview of the NEBINet and Bioinformatics programme. <br />President of Epygen Biotech FZ LLC, Dubai, UAE, Dr Debayan Ghosh, delivered the keynote address. <br />St Edmund's College governing body secretary Brother Simon Coelho and St Edmund's College Principal Dr Sylvanus Lamare also spoke during the function.</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/2054/postdoc-positions-mammalian-transcriptome-evolution-at-sib</guid>
  <pubDate>Mon, 12 Aug 2013 19:58:33 -0500</pubDate>
  <link></link>
  <title><![CDATA[Postdoc Positions - Mammalian Transcriptome Evolution at SIB]]></title>
  <description><![CDATA[
<p>BIOINFORMATICS POSTDOC IN FUNCTIONAL EVOLUTIONARY GENOMICS</p>

<p>Center for Integrative Genomics, University of Lausanne, Switzerland</p>

<p>Two postdoctoral positions (2 years with possible extensions up to 5 years) are available immediately in the evolutionary genomics group of Henrik Kaessmann.</p>

<p>We are seeking highly qualified and enthusiastic applicants with strong skills in computational biology/bioinformatics, preferably also with experience in data mining and comparative or evolutionary genome analysis.</p>

<p>We have been interested in a range of topics related to the functional evolution of genomes from primates (e.g., the emergence of new genes and their functions) and other mammals (e.g., the origin and evolution of mammalian sex chromosomes). In the framework of a recently launched series of projects, a large amount of transcriptome and genome (e.g., epigenome) data are being produced by the wet lab unit of the group using next generation sequencing technologies for a unique collection of tissues from representative mammals and outgroup species (e.g., birds). Topics of current projects based on these data include the origins and/or evolution of protein-coding genes, alternative splicing, microRNAs, long noncoding RNAs, and dosage compensation.</p>

<p>The postdoctoral fellow will perform integrated evolutionary/bioinformatics analyses based on data produced in the lab and available genomic data. The specific project will be developed together with the candidate.</p>

<p>The language of the institute is English, and its members form an international group that is rapidly expanding. The institute is located in Lausanne, a beautiful city at Lake Geneva.</p>

<p>For more information on the group and our institute more generally, please refer to our website: http://www.unil.ch/cig/page7858_en.html</p>

<p>Please submit a CV, statement of research interest, and names of three references to: Henrik Kaessmann (Henrik.Kaessmann@unil.ch).</p>

<p>Webpage : http://www.unil.ch/cig/page7858.html</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/2534/bioinformatician-needs-ten-heads</guid>
	<pubDate>Sat, 17 Aug 2013 10:30:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/2534/bioinformatician-needs-ten-heads</link>
	<title><![CDATA[Bioinformatician needs ten heads !!!]]></title>
	<description><![CDATA[<p>Bioinformatics demands more and ... lots more knowledge. In this case Ravan, a mythological character from the Ramayan, can only be a real bioinformatician. :) :P</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/2534" length="90547" type="image/jpeg" />
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/2253/best-practices-in-bioinformatics-training-for-life-scientists</guid>
	<pubDate>Tue, 13 Aug 2013 15:47:34 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/2253/best-practices-in-bioinformatics-training-for-life-scientists</link>
	<title><![CDATA[Best practices in bioinformatics training for life scientists]]></title>
	<description><![CDATA[<p>Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts.</p>
<p>Find the detail paper at http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.full</p><p>Address of the bookmark: <a href="http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.full" rel="nofollow">http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.full</a></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner</guid>
	<pubDate>Tue, 10 Apr 2018 04:13:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner</link>
	<title><![CDATA[Bioinformatics OneLiner]]></title>
	<description><![CDATA[<p>To remove all line ends (\n) from a Unix text file:</p><pre>sed ':a;N;$!ba;s/\n//g' filename.txt &gt; newfilename_oneline.txt</pre><p>To get average for a column of numbers (here the second column $2):</p><pre>awk '{ sum += $2; n++ } END { if (n &gt; 0) print sum / n; }'</pre><p>To get sequence length for all sequences in a fasta file:</p><pre>awk '/^&gt;/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \<br />filename.fasta</pre><p>To copy (move, rename, etc) files based on their list in a text file:</p><pre>cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done</pre><p>To split bam files into sets with mapped and unmapped reads:</p><pre>samtools view -F4 sample.bam &gt; sample.mapped.sam<br />samtools view -f4 sample.bam &gt; sample.unmapped.sam</pre><p>To gzip all your fastq files using gnu parallel and gzip:</p><pre>parallel gzip ::: *.fastq</pre><p>To gzip all your fastq files using pigz:</p><pre>pigz *.fastq</pre><p>To count all sequences in a fasta file:</p><pre>grep "^&gt;" yourfile.fasta -c</pre><p>To count all sequences in all fasta files in your current directory:</p><pre>for a in *.fasta; do ls $a; grep "^&gt;" -c $a; done</pre><p>To keep only one copy of duplicated lines:</p><pre>awk '!seen[$0]++'</pre><p>To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:</p><pre>grep "^&gt;" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc</pre><p>To remove everything after the first space at each line, e.g. to to simplify fasta headers:</p><pre>cut -d' ' -f1 &lt; your_file</pre><p>To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):</p><pre>parallel "echo {} &amp;&amp; gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz</pre><p>To count reads in a all .fastq.gz files in your current folder:</p><pre>zcat *.gz | echo $((`wc -l`/4))</pre><p>To count reads in a all .fastq files in your current folder:</p><pre>cat *.fastq | echo $((`wc -l`/4))</pre><p>To count base pairs in a all .fastq.gz files in your current folder:</p><pre>zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c </pre><p>To split multifasta file into many fasta files:</p><pre>awk '/^&gt;/ {OUT=substr($0,2) ".fa"}; {print &gt;&gt; OUT; close(OUT)}' Input_File</pre><p>To convert Illumina FASTQ 1.3 to 1.8:</p><pre>sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&amp;'\''()*+,-.\/0123456789:;&lt;=&gt;?@ABCDEFGHIJ/' f.fastq</pre><p>To convert FASTQ to FASTA:</p><pre>sed -n '1~4s/^@/&gt;/p;2~4p' </pre><p>To get fastq read length distribution:</p><pre>cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c</pre><p>To deinterleave interleaved fastq file:</p><pre>cat myf.fq | paste - - - - - - - - | tee &gt;(cut -f 1-4 | tr "\t" "\n" &gt; myfile_1.fq) | cut -f 5-8 | \<br />tr "\t" "\n" &gt; myf2.fq </pre><p>To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght &gt;= 4000 + coverage &gt;=100):</p><pre>grep "^&gt;" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 &gt;= 4000 &amp;&amp; $6 &gt;= 100) print $0 }' | sort -k 4 -n | \<br />sed s"/ /_/"g</pre><p>To append something to all headers of your fasta files:</p><pre>sed 's/&gt;.*/&amp;YOURSTRING/' filename.fasta &gt; new_filename.fasta</pre><p>To replace/squeeze multiple adjacent spaces by only one space:&nbsp;</p><pre>tr -s " " &lt; file</pre><p>To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.</p><pre>cat your.fastq | paste - - - - | awk 'length($2)&nbsp; &gt;= 21 &amp;&amp; length($2) &lt;= 25' | sed 's/\t/\n/g' &gt; filtered.fastq</pre><p>To print difference between the last and first row in 5th column:</p><pre>awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt</pre><p>To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):</p><pre>awk '/^&gt;/{ seqlen=0; print; next; } seqlen &lt; 200 { if (seqlen + length($0) &gt; 200) $0 = substr($0, 1, 200-seqlen);\<br /> seqlen += length($0); print }' scaffolds.fasta &gt; 200bp_scaffolds.fasta</pre><p>&nbsp;To pipe a compressed fasta file directly into makeblastdb.</p><pre>gunzip -c fasta.gz | makeblastdb -in -</pre><p>To remove sequences with duplicate fasta headers from a fasta file.</p><pre>awk '/^&gt;/{f=!d[$1];d[$1]=1}f' in.fasta &gt; out.fasta</pre>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/2337/clinical-genomics-informatics-europe-at-lisbon-portugal</guid>
  <pubDate>Wed, 14 Aug 2013 09:58:34 -0500</pubDate>
  <link></link>
  <title><![CDATA[Clinical Genomics &amp; Informatics Europe at Lisbon, Portugal]]></title>
  <description><![CDATA[
<p>Bio-IT World and Cambridge Healthtech Institute's fifth international Clinical Genomics &amp; Informatics Europe conference will feature four main tracks on Clinical Exome Sequencing, High Scale Computing, Genome Informatics, and RNA-Seq and Transcriptome Analysis, as well as two pre-conference symposia on Clinical Epigenetics and Quantitative Digital Detection Technologies. The conference will tackle the huge amounts of sequencing data produced by new technologies that have introduced significant challenges for bioinformatics, both in terms of the analysis and interpretation of data and clinical implementation of novel variants. Members of the international community will come together to look at the science and informatics required to utilize next generation sequencing for the molecular diagnosis of complex diseases.</p>

<p>Dated : 04 Dec 2013 - 06 Dec 2013</p>

<p>More at : http://www.clinicalgenomicsinformatics.com/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/2423/cancers-origins-revealed</guid>
	<pubDate>Thu, 15 Aug 2013 13:06:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/2423/cancers-origins-revealed</link>
	<title><![CDATA[Cancer's origins revealed]]></title>
	<description><![CDATA[<p>Researchers have provided the first comprehensive compendium of mutational processes that drive tumour development. Together, these mutational processes explain most mutations found in 30 of the most common cancer types. This new understanding of cancer development could help to treat and prevent a wide-range of cancers.<br /><br />More at &gt;&gt; http://www.sanger.ac.uk/about/press/2013/130814.html</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>