<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/2334?offset=900</link>
	<atom:link href="https://bioinformaticsonline.com/related/2334?offset=900" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33741/diya-a-bacterial-annotation-pipeline-for-any-genomics-lab</guid>
	<pubDate>Fri, 30 Jun 2017 08:48:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33741/diya-a-bacterial-annotation-pipeline-for-any-genomics-lab</link>
	<title><![CDATA[DIYA: a bacterial annotation pipeline for any genomics lab]]></title>
	<description><![CDATA[<p><span>DIY Genomics is an open source bioinformatics consortium intended to bring a collection of tools and libraries into the hands of small scale genomics labs for the process of sequence assembly and annotation. Projects include DIYA, MGAP, CRISPR, and DIYGV</span></p>
<p><span>http://gmod.org/wiki/Diya</span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/diyg/" rel="nofollow">https://sourceforge.net/projects/diyg/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/1161/genomics-for-bioinformatician</guid>
	<pubDate>Sat, 20 Jul 2013 07:03:00 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/1161/genomics-for-bioinformatician</link>
	<title><![CDATA[Genomics for Bioinformatician]]></title>
	<description><![CDATA[<p>Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.<br /><br />Genomics was established by Fred Sanger when he first sequenced the complete genomes of a virus and a mitochondrion. His group established techniques of sequencing, genome mapping, data storage, and bioinformatic analyses in the 1970-1980s. A major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics. Study of the full set of proteins in a cell type or tissue, and the changes during various conditions, is called proteomics. A related concept is materiomics, which is defined as the study of the material properties of biological materials (e.g. hierarchical protein structures and materials, mineralized biological tissues, etc.) and their effect on the macroscopic function and failure in their biological context, linking processes, structure and properties at multiple scales through a materials science approach. The actual term 'genomics' is thought to have been coined by Dr. Tom Roderick, a geneticist at the Jackson Laboratory (Bar Harbor, ME) over beer at a meeting held in Maryland on the mapping of the human genome in 1986.<br /><br />The outcome of almost two years of intense discussions with literally hundreds of scientists and members of the public, has three major areas of focus: Genomics to Biology, Genomics to Health, and Genomics to Society.<br /><br /><strong><em>Genomics to Biology:</em></strong>&nbsp;<br />The human genome sequence provides foundational information that now will allow development of a comprehensive catalog of all of the genome's components, determination of the function of all human genes, and deciphering of how genes and proteins work together in pathways and networks.<br /><br /><strong><em>Genomics to Health:<br /></em></strong>Completion of the human genome sequence offers a unique opportunity to understand the role of genetic factors in health and disease, and to apply that understanding rapidly to prevention, diagnosis, and treatment. This opportunity will be realized through such genomics-based approaches as identification of genes and pathways and determining how they interact with environmental factors in health and disease, more precise prediction of disease susceptibility and drug response, early detection of illness, and development of entirely new therapeutic approaches.<br /><br /><strong><em>Genomics to Society:</em>&nbsp;<br /></strong>Just as the HGP has spawned new areas of research in basic biology and in health, it has created new opportunities in exploring the ethical, legal, and social implications (ELSI) of such work. These include defining policy options regarding the use of genomic information in both medical and non-medical settings and analysis of the impact of genomics on such concepts as race, ethnicity, kinship, individual and group identity, health, disease, and "normality" for traits and behaviors.<br /><br />This vision for the future of genomics is not just about the NHGRI. It encompasses the whole field of genomics, including the work of all the other Institutes and Centers at the NIH and of a number of other federal agencies. All of the NIH Institutes are already taking full advantage of the sequence and will apply its data to the better understanding of both rare and common diseases, almost all of which have a genetic component. A recent example of the way that the HGP and the knowledge and new technologies it has spawned are already facilitating science is the extremely rapid sequencing by groups in Canada and at the Centers for Disease Control and Prevention (CDC) in Atlanta of the genome of the virus that causes Severe Acute Respiratory Syndrome (SARS). The sequencing of the SARS virus genome provides insight into this new and deadly disease at a speed never before possible in science. In turn, this should lead to the rapid development of diagnostic tests and, in time, vaccines and effective treatments.<br /><br /><strong>Links for the addition material available on Net</strong></p><p><a href="http://pevsnerlab.kennedykrieger.org/bioinformatics/bioinf10_genomes.htm">Genomes and genomics:</a></p><p><a href="http://www.123genomics.com/learning.html">Bioinformatics and Genomics:</a></p><p><a href="http://www.ebi.ac.uk/pdbe/docs/roadshow_tutorial/strgenomics/tutorial.html">Structural genomics tutorial:</a></p><p><a href="http://www.hgu.mrc.ac.uk/Users/Philippe.Gautier/tutorial/index.html">Comparative Genomics Tutorial:</a></p><p><a href="http://www.scfbio-iitd.res.in/tutorial/genomics.html">GENOME TUTORIAL:</a></p><p><a href="http://genomebiology.com/content/pdf/gb-2001-3-1-reviews2001.pdf">Tools and resources for identifying protein families, domains and motifs</a></p><p><a href="http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/tools.shtml">Bioinformatics Tools</a><a href="http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/tools.shtml">&nbsp;<br />Tips, Tutorials, and Terminology for Using Selected Resources in Genome Database Guide:</a></p><p><a href="http://www.doe-mbi.ucla.edu/Reprints/R31%20Strong%20A%20Web-based%20Comparative%20Genomics%20tutorial%20Microbiology%20Eduction%202004.pdf">A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes:</a></p><p><a href="http://www.genome.gov/27530225">Free Online Tutorials Teach Anyone How to Use Genome Databases:</a></p><p><a href="http://mkweb.bcgsc.ca/circos/?tutorials">Circos to create concise, explanatory, unique and print-ready visualizations of your data:</a></p><p><a href="http://www.igd.cornell.edu/Comparative%20Genomics/Comparative%20Genomics%20Proj.html">Genomics and Comparative Genomics</a><a href="http://www.igd.cornell.edu/Comparative%20Genomics/Comparative%20Genomics%20Proj.html">&nbsp;Learning Module:</a></p><p><a href="http://psb.stanford.edu/psb10/conference-materials/tutorials/compgen-notes.pdf">Computational Challenges in Comparative Genomics</a></p><p><a href="http://psb.stanford.edu/psb10/conference-materials/tutorials/compgen-notes.pdf">A Tutorial:</a></p><p><a href="http://gramene.agrinome.org/tutorials/modules_tutorial.pdf">A Comparative Genomics Resource for Grains</a>:</p><p><a href="http://www.plantcell.org/cgi/content/full/21/12/3718">PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants:</a></p><p><a href="http://en.wikipedia.org/wiki/VISTA_(comparative_genomics)">VISTA</a><a href="http://en.wikipedia.org/wiki/VISTA_(comparative_genomics)">:</a></p><p>Software for Genomics</p><ol>
<li><strong>Artemis</strong>&nbsp;Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation.</li>
<li><strong>Chromas&nbsp;</strong>It will display and prints chromatogram files from ABI automated DNA sequencers, and Staden SCF files which the analysis programs for ALF, Li-Cor and Visible Genetics OpenGene sequencers can create.</li>
<li><strong>Glimmer</strong>&nbsp;A system for finding genes in microbial DNA, especially the genomes of bacteria and archaea.Glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DN</li>
<li><strong>Glimmer</strong>&nbsp;HMM&nbsp;A fast and accurate gene finder based on a GHMM architecture, developed specifically for eukaryotes. It incorporates splice site models adapted from the GeneSplicer program and uses interpolated Markov models for evaluating the coding regions.</li>
<li><strong>Glimmer</strong>&nbsp;M&nbsp;A gene finder derived from Glimmer, but developed specifically for eukaryotes. It is based on a dynamic programming algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The d</li>
<li><strong>MUMmer</strong>&nbsp;MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form.</li>
<li><strong>pDRAW</strong>&nbsp;pDRAW32 is being developed as a free time hobby project. It is far from finished, but as it has reached a point where it could be helpful for many labs, it is now available to the scientific community.</li>
<li><strong>Sequin</strong>&nbsp;Sequin is a stand-alone software tool developed by the NCBI for submitting and updating entries to the GenBank, EMBL, or DDBJ sequence databases. It is capable of handling simple submissions that contain a single short mRNA sequence, and complex submissio</li>
<li><strong>Staden&nbsp;</strong>The Staden Package consists of a series of tools for DNA sequence preparation (pregap4), assembly (gap4), editing (gap4) and DNA/protein sequence analysis (spin).</li>
</ol><p>For more software @&nbsp;<a href="http://bioinformaticsonline.com/bookmarks/view/926/list-of-popular-bioinformatics-softwaretools">http://bioinformaticsonline.com/bookmarks/view/926/list-of-popular-bioinformatics-softwaretools</a></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/34368/srbioinformatics-analyst-ngs-at-ocimum</guid>
  <pubDate>Fri, 17 Nov 2017 07:50:44 -0600</pubDate>
  <link></link>
  <title><![CDATA[Sr.Bioinformatics Analyst (NGS) at Ocimum]]></title>
  <description><![CDATA[
<p>JOB FUNCTIONBio Tech/R&amp;D/Scientist<br />INDUSTRYBiotechnology/Pharmaceutical/Medicine<br />SPECIALIZATIONBasic Research,Bio-Statistician,Clinical Research<br />QUALIFICATION<br />Any Post Graduate<br />BA (Arts), B.Com. (Commerce), BE/ B.Tech (Engineering), B.Pharm. (Pharmacy), B.Sc. (Science), BL/LLB, BDS (Dental Surgery), B.Ed. (Education), BHM (Hotel Management), BBA/ BBM/ BBS, B.Arch. (Architecture), BCA (Computer Application), Diploma-Other Diploma, B.Plan. (Planning), BGL, B.V.Sc. (Veterinary Science), Other School/ Graduation, BHMS (Homeopathy), BAMS (Ayurveda)<br />Job Description</p>

<p>1.  Must have basic understanding of molecular biology and Genomics.<br />2. Experience in application development or must have expertise in programming using either of Perl/Python.<br />3.  Experience in statistical programming using R/Bioconductor/Matlab.<br />4. Strong concept in statistical and mathematical modelling.<br />5.  Experience in designing and developing the bioinformatics pipeline.<br />6.  Must have minimum 2+ years of hands on experience in NSG data analysis such as RNA-Seq,Exome-Seq ,Chip-Seq and downstream analysis.<br />7. Knowledge in WGS ,WES, Targeted re-sequencing,GWAS and population genomics will be preferred.<br />8. Must have experience working on opensource software/Framework and commercial software for NGS data analysis and reporting.<br />9. Should be aware of handling big data and guiding team members on multiple projects simultaneously.<br />10. Should have experience coordinating with different groups of clinical research scientist for various project requirements.<br />11. Ability to work as team as well as independently with minimal support.</p>

<p>More at http://www3.ocimumbio.com/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/1219/research-with-help-of-bioinformatics-helpful</guid>
	<pubDate>Fri, 02 Aug 2013 11:20:24 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/1219/research-with-help-of-bioinformatics-helpful</link>
	<title><![CDATA[Research with help of bioinformatics helpful]]></title>
	<description><![CDATA[<p>Endocrinologist G.R. Sridhar says</p><blockquote><p>Research with the help of bioinformatics with a trans-disciplinary approach is yielding good results.</p><p>http://www.thehindu.com/features/education/research/research-with-help-of-bioinformatics-helpful/article2295629.ece</p></blockquote>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34814/bioinformatics-web-application-development-with-perl</guid>
	<pubDate>Tue, 26 Dec 2017 18:14:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34814/bioinformatics-web-application-development-with-perl</link>
	<title><![CDATA[Bioinformatics Web Application Development with Perl]]></title>
	<description><![CDATA[<div><p>Perl's second wave of adoption came from the growth of the world wide web. Dynamic web pages&mdash;the precursor to modern web applications&mdash;were easy to create with Perl and CGI. Thanks to Perl's ubiquity as a language for system administrators and its power to manipulate text, it was the default choice for web programming. Its presence everywhere made it popular and, in some ways, the duct tape of the Internet.</p><h4>Web Application Development</h4><p>The old days of CGI programs and the simple development style that represented seem clunky. Web pages have become web applications. Development has moved from generating static HTML to both client and server side programming, with rich client interfaces and powerful backends.</p><p>Perl is still well suited for developing modern web apps. The language grows more powerful and easier to use every year, the available libraries are wonderful and keep getting better, and the inventions and discoveries available in modern Perl are unsurpassed.</p><p>In particular, a modern Perl developer can do amazing things with modern Perl tools. If you still think of Perl web development as a&nbsp;<em>cgi-bin</em>&nbsp;directory full of messy scripts that spew warnings to STDERR, you're a decade out of date. Better yet, you can replace that mess piecemeal, thanks to the new tools and techniques of modern Perl. See, for example, the ever-growing list of technologies&nbsp;<a href="http://www.builtinperl.com/">Built in Perl</a>.</p><h4>Modern Perl Web Frameworks</h4><p>While the old wave of web development may have made the CGI.pm module central, modern Perl web programming follows a stricter separation of business logic, URL and request routing, and output. The days of slinging a string here, an array there, a Perl hash yonder, declaring every variable at the top of the program, and maybe making a subroutine are gone. The Perl world has seen the value of abstraction and ways to mechanize away boilerplate. Perl has dozens of frameworks and toolkits designed to make web development and deployment simpler.</p><p>Any of a dozen of these frameworks will help you do great things, but three in particular stand out. You can build web sites and web applications of tremendous value with all three. These are neither the only good possibilities (think of POE or Jifty or Continuity or...) nor the only mechanisms for web programming with Perl (see Mechanize or LWP or Mojo::UserAgent for more). Yet if you want three good options to choose between, start here.</p><h4>Catalyst</h4><p>The&nbsp;<a href="http://catalystframework.org/">Catalyst</a>&nbsp;framework is a flexible and powerful system for building small to large web apps. It uses the&nbsp;<a href="http://moose.perl.org/">Moose</a>&nbsp;object system to provide great APIs for extension and further development. It's the most mature of the modern top Perl web frameworks, yet it retains its flexibility and vibrancy. In particular, its plugin and extension ecosystem allows it to evolve to provide new and essential features.</p><p>Catalyst has embraced the Plack/PSGI standard for Perl web deployment and recent versions are exploring high-scalability, event-based request handling models.</p><h4>Dancer</h4><p>The&nbsp;<a href="http://perldancer.org/">Dancer</a>&nbsp;framework is deliberately minimal in syntax and scope, but it also has a vibrant plugin ecosystem. Dancer particularly excels for smaller sites and applications, though good programmers can build larger things with it.</p><p>The first version of Dancer was easy to use. Dancer 2 continues that ease while improving the internals and robustness of applications.</p><h4>Mojolicious</h4><p>The&nbsp;<a href="http://mojolicio.us/">Mojolicious</a>&nbsp;(Mojo) framework has a real-time design based on high performance event handling. Its focus is solving new and interesting problems in simple and effective ways, and the project has produced a lot of new code that does old things in better ways.</p><p>In particular, Mojolicious goes to great lengths to support new web standards, such as CSS 3, web sockets, and HTTP 2.</p><p>Where Catalyst embraces the CPAN fully, Mojolicious by design provides most of what an average app might need in a single download. It's still fully compatible with the CPAN, but the intention is to provide good working defaults in a package that's easy to start with. Mojo's fans are quick to praise it as fun to develop.</p><p>A modern Perl web developer should be familiar with at least one of these frameworks.</p><h4>Modern Perl Storage Mechanisms</h4><p>Perl's venerable&nbsp;<a href="http://search.cpan.org/perldoc?DBI">DBI</a>&nbsp;module has been the focal point of database access since its invention. Its design allows it to provide the same interface to huge relational databases and flat files alike through its DBD extension mechanism. Yet the DBI by itself isn't the be-all, end-all of data storage and access in Perl.</p><h4>DBIx::Class</h4><p><a href="http://search.cpan.org/perldoc?DBIx::Class">DBIx::Class</a>&nbsp;sits on top of DBI to provide an API to your database based on the concept of queries and results. This is often sufficient to remove all but the most complicated of SQL from your code, leaving you to manipulate your business models instead of the small details of how a relational database works. The power and maintainability you receive is well the small cost of the learning curve.</p><p>Even better, DBIC can manage (and even generate) your database schema for you.</p><p>Recent versions of DBIC have demonstrated that a well-written ORM can perform much better than even clever hand-written code. Because it builds on the Perl DBI, it scales everywhere from SQLite to PostgreSQL, MySQL, Oracle, and more.</p><h3>Rose::DB</h3><p>The lesser-known but no less powerful&nbsp;<a href="http://search.cpan.org/perldoc?Rose::DB::Object">Rose::DB::Object</a>&nbsp;builds on&nbsp;<a href="http://search.cpan.org/perldoc?Rose::DB">Rose::DB</a>&nbsp;to provide an object-relational mapper for Perl. While its high level features most directly compare to those of DBIx::Class, it's often measurably faster.</p><h4>NoSQL on the CPAN</h4><p>Of course the&nbsp;<a href="http://search.cpan.org/">CPAN</a>&nbsp;has modules for almost any NoSQL database or job queue or persistence mechanism you could name, and several you have never heard of. Everything you need is a quick CPAN or cpanm away!</p><h4>Modern Perl Deployment Strategies</h4><p>In the early days of the web, deploying a Perl web application meant putting one or more&nbsp;<em>.cgi</em>&nbsp;or&nbsp;<em>.pl</em>&nbsp;files in a special directory and hoping that your system administrator had everything configured correctly. The execution model was often slow and cumbersome, and accessing shared resources such as databases was often tricky.</p><p>Modern Perl has better choices. While deployment strategies are the source of many arguments, the return on your investment from learning the modern way is impressive.</p><h4>Plack/PSGI</h4><p>The PSGI specification (as exemplified by&nbsp;<a href="http://plackperl.org/">Plack</a>) describes a strategy for building Perl web apps independent of server and with the possibility to share custom processing behaviors.</p><p>In other words, it's a standard for writing Perl apps to take advantage of the huge ecosystem of Perl development available on the CPAN without tying yourself to a server like Apache, Apache 2, nginx, or anything else.</p><p>Any good modern Perl web framework (including those listed here) supports PSGI. Several deployment mechanisms exist to meet various business needs which also support PSGI. In particular, you can deploy the same application with a local testing server on your own machine as you can to your production server or servers without changing your application at all.</p><h4>mod_perl</h4><p>The older but still viable mod_perl Apache httpd module embeds Perl into the web server. This was the first widespread persistence mechanism for Perl web applications themselves and it's still popular to this day, though PSGI compliance is often the choice for new development. (PSGI handlers to use mod_perl as the backend are available.)</p><p>Modern Perl developers should familiarize themselves with PSGI and the wealth of available Plack middleware.</p><h4>Perl Web Development</h4><p>Of course no discussion of Perl web development would be complete without mentioning the strength of the CPAN. Almost any project will benefit from the wealth of freely available libraries built to solve real problems. These distributions run the gamut from full-blown web frameworks and content management systems to APIs for web services, development tools, testing systems, and interfaces to document formats and external resources.</p><p>For example, if you need to write a web service which accepts JSON data and produces Excel spreadsheets, you can glue together a few CPAN distributions and get the job done early. If you need to consume XML from a remote service and emit a PDF, you're in luck.</p><p>Perl's prowess as a general purpose programming language as well as its flexibility and power in managing text and gluing systems together make it a wonderful fit for web development. The community's adoption of modern Perl standards such as PSGI and Plack only enhance your power.</p><p>Web application development in Perl is still viable, and modern Perl tools and techniques and libraries make it more powerful and pleasant than ever.</p></div>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35525/linux-commands-cheat-sheet-for-bioinformatics-and-computational-biology-professionals</guid>
	<pubDate>Mon, 05 Feb 2018 18:50:41 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35525/linux-commands-cheat-sheet-for-bioinformatics-and-computational-biology-professionals</link>
	<title><![CDATA[Linux Commands Cheat Sheet for Bioinformatics and Computational Biology Professionals]]></title>
	<description><![CDATA[<p><span>The purpose of this cheat sheet is to introduce biologist and bioinformatician to the frequently used tools for NGS analysis as well as giving experience in writing one-liners.</span></p><ul>
<li><span></span><span><strong>File System</strong></span><span><strong><br /> </strong></span><span>ls</span><span>&nbsp;&mdash; list items in current directory</span><span><br /> </span><span>ls -l</span><span>&nbsp;&mdash; list items in current directory and show in long format to see perimissions, size, and modification date</span><span><br /> </span><span>ls -a</span><span>&nbsp;&mdash; list all items in current directory, including hidden files</span><span><br /> </span><span>ls -F</span><span>&nbsp;&mdash; list all items in current directory and show directories with a slash and executables with a star</span><span><br /> </span><span>ls dir</span><span>&nbsp;&mdash; list all items in directory dir</span><span><br /> </span><span>cd dir</span><span>&nbsp;&mdash; change directory to dir</span><span><br /> </span><span>cd ..</span><span>&nbsp;&mdash; go up one directory</span><span><br /> </span><span>cd /</span><span>&nbsp;&mdash; go to the root directory</span><span><br /> </span><span>cd ~</span><span>&nbsp;&mdash; go to to your home directory</span><span><br /> </span><span>cd -</span><span>&nbsp;&mdash; go to the last directory you were just in</span><span><br /> </span><span>pwd</span><span>&nbsp;&mdash; show present working directory</span><span><br /> </span><span>mkdir dir</span><span>&nbsp;&mdash; make directory dir</span><span><br /> </span><span>rm file</span><span>&nbsp;&mdash; remove file</span><span><br /> </span><span>rm -r dir</span><span>&nbsp;&mdash; remove directory dir recursively</span><span><br /> </span><span>cp file1 file2</span><span>&nbsp;&mdash; copy file1 to file2</span><span><br /> </span><span>cp -r dir1 dir2</span><span>&nbsp;&mdash; copy directory dir1 to dir2 recursively</span><span><br /> </span><span>mv file1 file2</span><span>&nbsp;&mdash; move (rename) file1 to file2</span><span><br /> </span><span>ln -s file link</span><span>&nbsp;&mdash; create symbolic link to file</span><span><br /> </span><span>touch file</span><span>&nbsp;&mdash; create or update file</span><span><br /> </span><span>cat file</span><span>&nbsp;&mdash; output the contents of file</span><span><br /> </span><span>less file</span><span>&nbsp;&mdash; view file with page navigation</span><span><br /> </span><span>head file</span><span>&nbsp;&mdash; output the first 10 lines of file</span><span><br /> </span><span>tail file</span><span>&nbsp;&mdash; output the last 10 lines of file</span><span><br /> </span><span>tail -f file</span><span>&nbsp;&mdash; output the contents of file as it grows, starting with the last 10 lines</span><span><br /> </span><span>vim file</span><span>&nbsp;&mdash; edit file</span><span><br /> </span><span>alias name 'command'</span><span>&nbsp;&mdash; create an alias for a command</span><span><br /> </span></li>
<li><span></span><span><strong>System</strong></span><span><strong><br /> </strong></span><span>shutdown</span><span>&nbsp;&mdash; shut down machine</span><span><br /> </span><span>reboot</span><span>&nbsp;&mdash; restart machine</span><span><br /> </span><span>date</span><span>&nbsp;&mdash; show the current date and time</span><span><br /> </span><span>whoami</span><span>&nbsp;&mdash; who you are logged in as</span><span><br /> </span><span>finger user</span><span>&nbsp;&mdash; display information about user</span><span><br /> </span><span>man command</span><span>&nbsp;&mdash; show the manual for command</span><span><br /> </span><span>df</span><span>&nbsp;&mdash; show disk usage</span><span><br /> </span><span>du</span><span>&nbsp;&mdash; show directory space usage</span><span><br /> </span><span>free</span><span>&nbsp;&mdash; show memory and swap usage</span><span><br /> </span><span>whereis app</span><span>&nbsp;&mdash; show possible locations of app</span><span><br /> </span><span>which app</span><span>&nbsp;&mdash; show which app will be run by default</span><span><br /> </span></li>
<li><span></span><span><strong>Process Management</strong></span><span><strong><br /> </strong></span><span>ps</span><span>&nbsp;&mdash; display your currently active processes</span><span><br /> </span><span>top</span><span>&nbsp;&mdash; display all running processes</span><span><br /> </span><span>kill pid</span><span>&nbsp;&mdash; kill process id pid</span><span><br /> </span><span>kill -9 pid</span><span>&nbsp;&mdash; force kill process id pid</span><span><br /> </span></li>
<li><span></span><span><strong>Permissions</strong></span><span><strong><br /> </strong></span><span>ls -l</span><span>&nbsp;&mdash; list items in current directory and show permissions</span><span><br /> </span><span>chmod ugo file</span><span>&nbsp;&mdash; change permissions of file to ugo - u is the user's permissions, g is the group's permissions, and o is everyone else's permissions. The values of u, g, and o can be any number between 0 and 7.</span><span><br /> </span><span>7</span><span>&nbsp;&mdash; full permissions</span><span><br /> </span><span>6</span><span>&nbsp;&mdash; read and write only</span><span><br /> </span><span>5</span><span>&nbsp;&mdash; read and execute only</span><span><br /> </span><span>4</span><span>&nbsp;&mdash; read only</span><span><br /> </span><span>3</span><span>&nbsp;&mdash; write and execute only</span><span><br /> </span><span>2</span><span>&nbsp;&mdash; write only</span><span><br /> </span><span>1</span><span>&nbsp;&mdash; execute only</span><span><br /> </span><span>0</span><span>&nbsp;&mdash; no permissions</span><span><br /> </span><span>chmod 600 file</span><span>&nbsp;&mdash; you can read and write - good for files</span><span><br /> </span><span>chmod 700 file</span><span>&nbsp;&mdash; you can read, write, and execute - good for scripts</span><span><br /> </span><span>chmod 644 file</span><span>&nbsp;&mdash; you can read and write, and everyone else can only read - good for web pages</span><span><br /> </span><span>chmod 755 file</span><span>&nbsp;&mdash; you can read, write, and execute, and everyone else can read and execute - good for programs that you want to share</span><span><br /> </span></li>
<li><span></span><span><strong>Networking</strong></span><span><strong><br /> </strong></span><span>wget file</span><span>&nbsp;&mdash; download a file</span><span><br /> </span><span>curl file</span><span>&nbsp;&mdash; download a file</span><span><br /> </span><span>scp user@host:file dir</span><span>&nbsp;&mdash; secure copy a file from remote server to the dir directory on your machine</span><span><br /> </span><span>scp file user@host:dir</span><span>&nbsp;&mdash; secure copy a file from your machine to the dir directory on a remote server</span><span><br /> </span><span>scp -r user@host:dir dir</span><span>&nbsp;&mdash; secure copy the directory dir from remote server to the directory dir on your machine</span><span><br /> </span><span>ssh user@host</span><span>&nbsp;&mdash; connect to host as user</span><span><br /> </span><span>ssh -p port user@host</span><span>&nbsp;&mdash; connect to host on port as user</span><span><br /> </span><span>ssh-copy-id user@host</span><span>&nbsp;&mdash; add your key to host for user to enable a keyed or passwordless login</span><span><br /> </span><span>ping host</span><span>&nbsp;&mdash; ping host and output results</span><span><br /> </span><span>whois domain</span><span>&nbsp;&mdash; get information for domain</span><span><br /> </span><span>dig domain</span><span>&nbsp;&mdash; get DNS information for domain</span><span><br /> </span><span>dig -x host</span><span>&nbsp;&mdash; reverse lookup host</span><span><br /> </span><span>lsof -i tcp:1337</span><span>&nbsp;&mdash; list all processes running on port 1337</span><span><br /> </span></li>
<li><span></span><span><strong>Searching</strong></span><span><strong><br /> </strong></span><span>grep pattern files</span><span>&nbsp;&mdash; search for pattern in files</span><span><br /> </span><span>grep -r pattern dir</span><span>&nbsp;&mdash; search recursively for pattern in dir</span><span><br /> </span><span>grep -rn pattern dir</span><span>&nbsp;&mdash; search recursively for pattern in dir and show the line number found</span><span><br /> </span><span>grep -r pattern dir --include='*.ext</span><span>&nbsp;&mdash; search recursively for pattern in dir and only search in files with .ext extension</span><span><br /> </span><span>command | grep pattern</span><span>&nbsp;&mdash; search for pattern in the output of command</span><span><br /> </span><span>find file</span><span>&nbsp;&mdash; find all instances of file in real system</span><span><br /> </span><span>locate file</span><span>&nbsp;&mdash; find all instances of file using indexed database built from the updatedb command. Much faster than find</span><span><br /> </span><span>sed -i 's/day/night/g' file</span><span>&nbsp;&mdash; find all occurrences of day in a file and replace them with night - s means substitude and g means global - sed also supports regular expressions</span><span><br /> </span></li>
<li><span></span><span><strong>Compression</strong></span><span><strong><br /> </strong></span><span>tar cf file.tar files</span><span>&nbsp;&mdash; create a tar named file.tar containing files</span><span><br /> </span><span>tar xf file.tar</span><span>&nbsp;&mdash; extract the files from file.tar</span><span><br /> </span><span>tar czf file.tar.gz files</span><span>&nbsp;&mdash; create a tar with Gzip compression</span><span><br /> </span><span>tar xzf file.tar.gz</span><span>&nbsp;&mdash; extract a tar using Gzip</span><span><br /> </span><span>gzip file</span><span>&nbsp;&mdash; compresses file and renames it to file.gz</span><span><br /> </span><span>gzip -d file.gz</span><span>&nbsp;&mdash; decompresses file.gz back to file</span><span><br /> </span></li>
<li><span></span><span><strong>Shortcuts</strong></span><span><strong><br /> </strong></span><span>ctrl+a</span><span>&nbsp;&mdash; move cursor to beginning of line</span><span><br /> </span><span>ctrl+f</span><span>&nbsp;&mdash; move cursor to end of line</span><span><br /> </span><span>alt+f</span><span>&nbsp;&mdash; move cursor forward 1 word</span><span><br /> </span><span>alt+b</span><span>&nbsp;&mdash; move cursor backward 1 word</span><span><br /> </span></li>
<li></li>
</ul>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35983/some-useful-bioinformatics-links</guid>
	<pubDate>Fri, 16 Mar 2018 20:50:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35983/some-useful-bioinformatics-links</link>
	<title><![CDATA[Some useful Bioinformatics links]]></title>
	<description><![CDATA[<p><br /> Reference-free prediction of rearrangement breakpoint reads | Bioinformatics | Oxford Academic</p><p>https://academic.oup.com/bioinformatics/article/30/18/2559/2475628<br /> Reference-free SNP detection: dealing with the data deluge</p><p>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083407/<br /> GATB/DiscoSnp: DiscoSnp is designed for discovering all kinds of SNPs (not only isolated ones), as well as insertions and deletions, from raw set(s) of reads.</p><p>https://github.com/GATB/DiscoSnp<br /> De novo assembly | Oxford Nanopore Technologies</p><p>https://nanoporetech.com/taxonomy/term/131<br /> De novo long-read assembly of a complex animal genome | bioRxiv</p><p>https://www.biorxiv.org/content/early/2017/09/10/187054<br /> Rapid de novo assembly of the European eel genome from nanopore sequencing reads | Scientific Reports</p><p>https://www.nature.com/articles/s41598-017-07650-6.epdf?author_access_token=dktG7e98wyRJnaEEMTcPqtRgN0jAjWel9jnR3ZoTv0P7E7t-wVGo30iojNO7dICajNY_7PE5xVPv6OoLe7hn9TeUjcZ5umREOzNoPMWkfYH58RS6uxm3vm4e4BG2AA_WKW84i6egKK271NwMq-NfzA%3D%3D<br /> nanoporetech/ont-assembly-polish: ONT assembly and Illumina polishing pipeline</p><p>https://github.com/nanoporetech/ont-assembly-polish<br /> Generade-nl/TULIP: TULIP - The Uncorrected Long read Itegration Pipeline</p><p>https://github.com/Generade-nl/TULIP<br /> www.nature.com</p><p>https://www.nature.com/articles/s41598-017-03996-z<br /> Example gallery of NanoPlot &ndash; Gigabase or gigabyte</p><p>https://gigabaseorgigabyte.wordpress.com/2017/06/01/example-gallery-of-nanoplot/<br /> Tool documentation</p><p>https://broadinstitute.github.io/picard/command-line-overview.html<br /> Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. - PubMed - NCBI</p><p>https://www.ncbi.nlm.nih.gov/pubmed/24185095<br /> MAFFT ver.7 - a multiple sequence alignment program</p><p>https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html<br /> Measuring the distance between multiple sequence alignments | Bioinformatics | Oxford Academic</p><p>https://academic.oup.com/bioinformatics/article/28/4/495/212883<br /> The MUMmer 3 examples</p><p>http://mummer.sourceforge.net/examples/<br /> MAFFT ver.7 - a multiple sequence alignment program</p><p>https://mafft.cbrc.jp/alignment/software/tips.html<br /> Omega | Overlap-graph de novo Assembler for Metagenomics</p><p>https://omega.omicsbio.org/<br /> abiswas-odu/Disco: Multi-threaded Distributed Memory Overlap-Layout-Consensus (OLC) Metagenome Assembler</p><p>https://github.com/abiswas-odu/Disco<br /> SAGE: String-overlap Assembly of GEnomes | BMC Bioinformatics | Full Text</p><p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-302</p><p>Fast and sensitive mapping of nanopore sequencing reads with GraphMap | Nature Communications</p><p>https://www.nature.com/articles/ncomms11307<br /> lumpy-sv/extractSplitReads_BwaMem at master &middot; arq5x/lumpy-sv</p><p>https://github.com/arq5x/lumpy-sv/blob/master/scripts/extractSplitReads_BwaMem<br /> jts/nanocorrect: Experimental pipeline for correcting nanopore reads</p><p>https://github.com/jts/nanocorrect</p><p>video - how to install flash plugin on ubuntu 14.04 LTS 64-bit version - Ask Ubuntu</p><p>https://askubuntu.com/questions/469553/how-to-install-flash-plugin-on-ubuntu-14-04-lts-64-bit-version<br /> lh3/fermi: A WGS de novo assembler based on the FMD-index for large genomes</p><p>https://github.com/lh3/fermi<br /> Multi-metagenome</p><p>http://madsalbertsen.github.io/multi-metagenome/docs/step9.html<br /> Bandage by rrwick</p><p>https://rrwick.github.io/Bandage/<br /> Codon Optimization OnLine (COOL): a web-based multi-objective optimization platform for synthetic gene design | Bioinformatics | Oxford Academic</p><p>https://academic.oup.com/bioinformatics/article/30/15/2210/2391162<br /> Genome Architecture and Evolution of a Unichromosomal Asexual Nematode - ScienceDirect</p><p>https://www.sciencedirect.com/science/article/pii/S096098221731076X?via%3Dihub#fig4<br /> How to determine chimeras in my de novo assembly? - SEQanswers</p><p>http://seqanswers.com/forums/showthread.php?t=26721<br /> samtools(1) manual page</p><p>http://www.htslib.org/doc/samtools.html<br /> How To Filter Mapped Reads With Samtools</p><p>https://www.biostars.org/p/56246/<br /> The MUMmer 3 manual</p><p>http://mummer.sourceforge.net/manual/#nucmer<br /> assembly_olc.pdf</p><p>http://www.cs.jhu.edu/~langmea/resources/lecture_notes/assembly_olc.pdf<br /> SAM and BAM filtering oneliners</p><p>https://gist.github.com/davfre/8596159<br /> Inroduction to dot-plots</p><p>http://www.code10.info/index.php%3Foption%3Dcom_content%26view%3Darticle%26id%3D64:inroduction-to-dot-plots%26catid%3D52:cat_coding_algorithms_dot-plots%26Itemid%3D76<br /> RepeatFinder Home Page</p><p>http://www.cbcb.umd.edu/software/RepeatFinder/<br /> RepeatFinderReprint.pdf</p><p>http://www.cbcb.umd.edu/software/RepeatFinder/RepeatFinderReprint.pdf<br /> https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CreateIdeogram/CreateIdeogram.html</p><p>https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CreateIdeogram/CreateIdeogram.html<br /> Circular Visualization in R</p><p>http://zuguang.de/circlize_book/book/introduction.html#a-qiuck-glance<br /> Creating a coverage plot using BEDTools and R</p><p>https://davetang.org/muse/2015/08/05/creating-a-coverage-plot-using-bedtools-and-r/<br /> Eval: A software package for analysis of genome annotations | BMC Bioinformatics | Full Text</p><p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-4-50<br /> eval-documentation.pdf</p><p>http://mblab.wustl.edu/media/software/eval-documentation.pdf<br /> OmicCircos: A Simple-to-Use R Package for the Circular Visualization of Multidimensional Omics Data</p><p>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3921174/<br /> sequence - download.tardigrades.org &gt; v1 &gt; sequence</p><p>http://download.tardigrades.org/v1/sequence/<br /> ksahlin/BESST: BESST - scaffolder for genomic assemblies</p><p>https://github.com/ksahlin/BESST<br /> reubwn/scripts: Useful scripts for various things</p><p>https://github.com/reubwn/scripts<br /> ICEberg</p><p>http://db-mml.sjtu.edu.cn/ICEberg/index.php<br /> Satsuma - Evolution and Genomics</p><p>http://evomics.org/learning/genomics/satsuma/<br /> A complete bacterial genome assembled de novo using only nanopore sequencing data | Nature Methods</p><p>https://www.nature.com/articles/nmeth.3444<br /> vezzi/FRC_align: Computes FRC from SAM/BAM file and not from afg files</p><p>https://mail.google.com/mail/u/0/#inbox<br /> Read GTF file into R - Dave Tang's blog</p><p>https://davetang.org/muse/2017/08/04/read-gtf-file-r/</p><p>https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CustomGenomes/CustomGenomes.html</p><p>https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CustomGenomes/CustomGenomes.html<br /> Dot: Interactive dot plot for genome-genome alignments</p><p>https://dnanexus.github.io/dot/<br /> Zoho Accounts</p><p>https://accounts.zoho.eu/signin?servicename=ZohoProjects&amp;serviceurl=https%3A%2F%2Fprojects.zoho.eu%2Fportal%2Favaga2<br /> lh3/minimap2: A versatile pairwise aligner for genomic and spliced nucleotide sequences</p><p>https://github.com/lh3/minimap2<br /> SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information | BMC Bioinformatics | Full Text</p><p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-211<br /> Palindromic gene amplification &mdash; an evolutionarily conserved role for DNA inverted repeats in the genome | Nature Reviews Cancer</p><p>https://www.nature.com/articles/nrc2591<br /> bioinformatics - BLAST DNA Sequences Reversed - Biology Stack Exchange</p><p>https://biology.stackexchange.com/questions/8160/blast-dna-sequences-reversed<br /> LASTZ</p><p>http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html<br /> SOGo - (1652) Inbox</p><p>https://sogo.unamur.be/SOGo/so/jnarayan/Mail/view<br /> Tetra-Nucleotide Analysis (TNA) | BIOiPLUG Help center</p><p>http://help.bioiplug.com/tetra-nucleotide-analysis-tna/</p><p>Clustering metagenomic contigs on tetranucleotide frequency &mdash; CGAT documentation</p><p>http://cgat.readthedocs.io/en/latest/recipes/metagenome_contigs_kmers.html</p><p>&nbsp;</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36384/binding-site-prediction-in-protein</guid>
	<pubDate>Wed, 25 Apr 2018 04:35:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36384/binding-site-prediction-in-protein</link>
	<title><![CDATA[Binding Site Prediction in Protein !]]></title>
	<description><![CDATA[<p><span>The interaction between proteins and other molecules is fundamental to all biological functions. In this section we include tools that can assist in prediction of interaction sites on protein surface and tools for predicting the structure of the intermolecular complex formed between two or more molecules (docking).</span></p><h4>Pockets Identification</h4><p><a href="http://sts.bioengr.uic.edu/castp/" target="_blank">CASTp</a></p><div style="text-align: justify;">Automatic Identification of pockets and cavities in proteins structure, and quantitation of their volumes using Delaunay triangulation. Available also as PyMOL plugin</div><p><a href="http://www.bioinformatics.leeds.ac.uk/pocketfinder/" target="_blank">Pocket-Finder</a></p><div style="text-align: justify;">Automatic identification of pockets and cavities in proteins structure, and quantitation of their volumes.</div><p><a href="http://gecco.org.chemie.uni-frankfurt.de/pocketpicker/index.html" target="_blank">PocketPicker</a></p><div style="text-align: justify;">Grid-based technique for the analysis of protein pockets. PocketPicker available as a plugin for&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/pymol.htm">PyMOL</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><h4>Binding Site Prediction</h4>
<p><a href="http://consurf.tau.ac.il/" target="_blank">ConSurf</a></p>
</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Identification of functional regions in proteins by surface-mapping of phylogenetic information</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://www-cryst.bioc.cam.ac.uk/~crescendo/crescendo.php" target="_blank">CRESCENDO</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Identification protein interaction sites. It uses sequence conservation patterns in homologous proteins to distinguish between residues that are conserved due to structural restraints from those due to functional restraints.&nbsp;&nbsp;</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><strong>Ligand Binding Sites</strong></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://www.sbg.bio.ic.ac.uk/~3dligandsite/" target="_blank">3DLigandSite</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">The server utilizes protein-structure prediction to provide structural models of the binding site. Ligands bound to structures are superimposed onto the model and use to predict the binding site.</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">F<a href="http://cssb.biology.gatech.edu/skolnick/files/FINDSITE/" target="_blank">INDSITE</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">A threading-based method for ligand-binding site prediction and functional annotation based on binding-site similarity across superimposed groups of threading templates.</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">
<p><a href="http://scoppi.biotec.tu-dresden.de/pocket/" target="_blank">LIGSITE<sup>csc</sup></a></p>
<div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Prediction of binding site by pocket identification using the Connolly surface and degree of conservation</div>
<p><a href="http://metapocket.eml.org/" target="_blank"></a></p>
</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://metapocket.eml.org/" target="_blank">metaPocket</a>A meta server for ligand-binding site prediction. metaPocket use&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#ligsite">LIGSITE<sup>csc</sup></a>,&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#pass">PASS</a>,&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#qsite">Q-SiteFinder</a>&nbsp;and&nbsp;<a href="http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html" target="_blank">SURFNET</a></div>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/1720/postdoctoral-associate-bioinformatics-at-duke-university-medical-center</guid>
  <pubDate>Sat, 10 Aug 2013 18:38:38 -0500</pubDate>
  <link></link>
  <title><![CDATA[Postdoctoral Associate - Bioinformatics  at Duke University Medical Center]]></title>
  <description><![CDATA[
<p>The Department of Biostatistics and Bioinformatics at Duke University Medical Center is seeking a Postdoctoral Associate for a one year appointment to work on several high-dimensional research projects. The specific goals of the project are to identify genes or molecular markers that are predictive of clinical outcomes in renal and prostate cancer.</p>

<p>Candidates must have: a PhD degree in statistics, biostatistics or bioinformatics, extensive experience in analyzing high-dimensional data (microarray, SNP, CNVs) and of validation approaches. In addition, experience in penalized regression methods, data base manipulation; and strong programming skills in order to conduct Monte Carlo studies and applications (R). Candidate must have excellent communication skills (verbal, written and presentation), a strong proficiency in Linux system.</p>

<p>This position is available immediately and will be filled as soon as possible. Appointment could be extended beyond the first year based on additional funding.</p>

<p>For more information about the Department of Biostatistics and Bioinformatics, please visit our website: http://www.biostat.duke.edu.</p>

<p>For more info: http://biostat.duke.edu/sites/biostat.duke.edu/files/Halabi%20-%20Postdoc%20Job%20Posting%202013%20updated.pdf</p>

<p>Duke University is an Equal Opportunity/Affirmative Action Employer.</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>