<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/35798?offset=150</link>
	<atom:link href="https://bioinformaticsonline.com/related/35798?offset=150" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34814/bioinformatics-web-application-development-with-perl</guid>
	<pubDate>Tue, 26 Dec 2017 18:14:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34814/bioinformatics-web-application-development-with-perl</link>
	<title><![CDATA[Bioinformatics Web Application Development with Perl]]></title>
	<description><![CDATA[<div><p>Perl's second wave of adoption came from the growth of the world wide web. Dynamic web pages&mdash;the precursor to modern web applications&mdash;were easy to create with Perl and CGI. Thanks to Perl's ubiquity as a language for system administrators and its power to manipulate text, it was the default choice for web programming. Its presence everywhere made it popular and, in some ways, the duct tape of the Internet.</p><h4>Web Application Development</h4><p>The old days of CGI programs and the simple development style that represented seem clunky. Web pages have become web applications. Development has moved from generating static HTML to both client and server side programming, with rich client interfaces and powerful backends.</p><p>Perl is still well suited for developing modern web apps. The language grows more powerful and easier to use every year, the available libraries are wonderful and keep getting better, and the inventions and discoveries available in modern Perl are unsurpassed.</p><p>In particular, a modern Perl developer can do amazing things with modern Perl tools. If you still think of Perl web development as a&nbsp;<em>cgi-bin</em>&nbsp;directory full of messy scripts that spew warnings to STDERR, you're a decade out of date. Better yet, you can replace that mess piecemeal, thanks to the new tools and techniques of modern Perl. See, for example, the ever-growing list of technologies&nbsp;<a href="http://www.builtinperl.com/">Built in Perl</a>.</p><h4>Modern Perl Web Frameworks</h4><p>While the old wave of web development may have made the CGI.pm module central, modern Perl web programming follows a stricter separation of business logic, URL and request routing, and output. The days of slinging a string here, an array there, a Perl hash yonder, declaring every variable at the top of the program, and maybe making a subroutine are gone. The Perl world has seen the value of abstraction and ways to mechanize away boilerplate. Perl has dozens of frameworks and toolkits designed to make web development and deployment simpler.</p><p>Any of a dozen of these frameworks will help you do great things, but three in particular stand out. You can build web sites and web applications of tremendous value with all three. These are neither the only good possibilities (think of POE or Jifty or Continuity or...) nor the only mechanisms for web programming with Perl (see Mechanize or LWP or Mojo::UserAgent for more). Yet if you want three good options to choose between, start here.</p><h4>Catalyst</h4><p>The&nbsp;<a href="http://catalystframework.org/">Catalyst</a>&nbsp;framework is a flexible and powerful system for building small to large web apps. It uses the&nbsp;<a href="http://moose.perl.org/">Moose</a>&nbsp;object system to provide great APIs for extension and further development. It's the most mature of the modern top Perl web frameworks, yet it retains its flexibility and vibrancy. In particular, its plugin and extension ecosystem allows it to evolve to provide new and essential features.</p><p>Catalyst has embraced the Plack/PSGI standard for Perl web deployment and recent versions are exploring high-scalability, event-based request handling models.</p><h4>Dancer</h4><p>The&nbsp;<a href="http://perldancer.org/">Dancer</a>&nbsp;framework is deliberately minimal in syntax and scope, but it also has a vibrant plugin ecosystem. Dancer particularly excels for smaller sites and applications, though good programmers can build larger things with it.</p><p>The first version of Dancer was easy to use. Dancer 2 continues that ease while improving the internals and robustness of applications.</p><h4>Mojolicious</h4><p>The&nbsp;<a href="http://mojolicio.us/">Mojolicious</a>&nbsp;(Mojo) framework has a real-time design based on high performance event handling. Its focus is solving new and interesting problems in simple and effective ways, and the project has produced a lot of new code that does old things in better ways.</p><p>In particular, Mojolicious goes to great lengths to support new web standards, such as CSS 3, web sockets, and HTTP 2.</p><p>Where Catalyst embraces the CPAN fully, Mojolicious by design provides most of what an average app might need in a single download. It's still fully compatible with the CPAN, but the intention is to provide good working defaults in a package that's easy to start with. Mojo's fans are quick to praise it as fun to develop.</p><p>A modern Perl web developer should be familiar with at least one of these frameworks.</p><h4>Modern Perl Storage Mechanisms</h4><p>Perl's venerable&nbsp;<a href="http://search.cpan.org/perldoc?DBI">DBI</a>&nbsp;module has been the focal point of database access since its invention. Its design allows it to provide the same interface to huge relational databases and flat files alike through its DBD extension mechanism. Yet the DBI by itself isn't the be-all, end-all of data storage and access in Perl.</p><h4>DBIx::Class</h4><p><a href="http://search.cpan.org/perldoc?DBIx::Class">DBIx::Class</a>&nbsp;sits on top of DBI to provide an API to your database based on the concept of queries and results. This is often sufficient to remove all but the most complicated of SQL from your code, leaving you to manipulate your business models instead of the small details of how a relational database works. The power and maintainability you receive is well the small cost of the learning curve.</p><p>Even better, DBIC can manage (and even generate) your database schema for you.</p><p>Recent versions of DBIC have demonstrated that a well-written ORM can perform much better than even clever hand-written code. Because it builds on the Perl DBI, it scales everywhere from SQLite to PostgreSQL, MySQL, Oracle, and more.</p><h3>Rose::DB</h3><p>The lesser-known but no less powerful&nbsp;<a href="http://search.cpan.org/perldoc?Rose::DB::Object">Rose::DB::Object</a>&nbsp;builds on&nbsp;<a href="http://search.cpan.org/perldoc?Rose::DB">Rose::DB</a>&nbsp;to provide an object-relational mapper for Perl. While its high level features most directly compare to those of DBIx::Class, it's often measurably faster.</p><h4>NoSQL on the CPAN</h4><p>Of course the&nbsp;<a href="http://search.cpan.org/">CPAN</a>&nbsp;has modules for almost any NoSQL database or job queue or persistence mechanism you could name, and several you have never heard of. Everything you need is a quick CPAN or cpanm away!</p><h4>Modern Perl Deployment Strategies</h4><p>In the early days of the web, deploying a Perl web application meant putting one or more&nbsp;<em>.cgi</em>&nbsp;or&nbsp;<em>.pl</em>&nbsp;files in a special directory and hoping that your system administrator had everything configured correctly. The execution model was often slow and cumbersome, and accessing shared resources such as databases was often tricky.</p><p>Modern Perl has better choices. While deployment strategies are the source of many arguments, the return on your investment from learning the modern way is impressive.</p><h4>Plack/PSGI</h4><p>The PSGI specification (as exemplified by&nbsp;<a href="http://plackperl.org/">Plack</a>) describes a strategy for building Perl web apps independent of server and with the possibility to share custom processing behaviors.</p><p>In other words, it's a standard for writing Perl apps to take advantage of the huge ecosystem of Perl development available on the CPAN without tying yourself to a server like Apache, Apache 2, nginx, or anything else.</p><p>Any good modern Perl web framework (including those listed here) supports PSGI. Several deployment mechanisms exist to meet various business needs which also support PSGI. In particular, you can deploy the same application with a local testing server on your own machine as you can to your production server or servers without changing your application at all.</p><h4>mod_perl</h4><p>The older but still viable mod_perl Apache httpd module embeds Perl into the web server. This was the first widespread persistence mechanism for Perl web applications themselves and it's still popular to this day, though PSGI compliance is often the choice for new development. (PSGI handlers to use mod_perl as the backend are available.)</p><p>Modern Perl developers should familiarize themselves with PSGI and the wealth of available Plack middleware.</p><h4>Perl Web Development</h4><p>Of course no discussion of Perl web development would be complete without mentioning the strength of the CPAN. Almost any project will benefit from the wealth of freely available libraries built to solve real problems. These distributions run the gamut from full-blown web frameworks and content management systems to APIs for web services, development tools, testing systems, and interfaces to document formats and external resources.</p><p>For example, if you need to write a web service which accepts JSON data and produces Excel spreadsheets, you can glue together a few CPAN distributions and get the job done early. If you need to consume XML from a remote service and emit a PDF, you're in luck.</p><p>Perl's prowess as a general purpose programming language as well as its flexibility and power in managing text and gluing systems together make it a wonderful fit for web development. The community's adoption of modern Perl standards such as PSGI and Plack only enhance your power.</p><p>Web application development in Perl is still viable, and modern Perl tools and techniques and libraries make it more powerful and pleasant than ever.</p></div>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/view/2023</guid>
	<pubDate>Mon, 12 Aug 2013 09:36:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/view/2023</link>
	<title><![CDATA[What is the objective of BINC examination?]]></title>
	<description><![CDATA[<p>I personally did not understand the objective behind BINC examination. Is government only try to show off that they are doing something to promote Indian bioinformatics sector?</p><p>Moreover, It looks like BINC indirectly putting&nbsp;an extra burden to bioinformatician, mentally and financially.</p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35525/linux-commands-cheat-sheet-for-bioinformatics-and-computational-biology-professionals</guid>
	<pubDate>Mon, 05 Feb 2018 18:50:41 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35525/linux-commands-cheat-sheet-for-bioinformatics-and-computational-biology-professionals</link>
	<title><![CDATA[Linux Commands Cheat Sheet for Bioinformatics and Computational Biology Professionals]]></title>
	<description><![CDATA[<p><span>The purpose of this cheat sheet is to introduce biologist and bioinformatician to the frequently used tools for NGS analysis as well as giving experience in writing one-liners.</span></p><ul>
<li><span></span><span><strong>File System</strong></span><span><strong><br /> </strong></span><span>ls</span><span>&nbsp;&mdash; list items in current directory</span><span><br /> </span><span>ls -l</span><span>&nbsp;&mdash; list items in current directory and show in long format to see perimissions, size, and modification date</span><span><br /> </span><span>ls -a</span><span>&nbsp;&mdash; list all items in current directory, including hidden files</span><span><br /> </span><span>ls -F</span><span>&nbsp;&mdash; list all items in current directory and show directories with a slash and executables with a star</span><span><br /> </span><span>ls dir</span><span>&nbsp;&mdash; list all items in directory dir</span><span><br /> </span><span>cd dir</span><span>&nbsp;&mdash; change directory to dir</span><span><br /> </span><span>cd ..</span><span>&nbsp;&mdash; go up one directory</span><span><br /> </span><span>cd /</span><span>&nbsp;&mdash; go to the root directory</span><span><br /> </span><span>cd ~</span><span>&nbsp;&mdash; go to to your home directory</span><span><br /> </span><span>cd -</span><span>&nbsp;&mdash; go to the last directory you were just in</span><span><br /> </span><span>pwd</span><span>&nbsp;&mdash; show present working directory</span><span><br /> </span><span>mkdir dir</span><span>&nbsp;&mdash; make directory dir</span><span><br /> </span><span>rm file</span><span>&nbsp;&mdash; remove file</span><span><br /> </span><span>rm -r dir</span><span>&nbsp;&mdash; remove directory dir recursively</span><span><br /> </span><span>cp file1 file2</span><span>&nbsp;&mdash; copy file1 to file2</span><span><br /> </span><span>cp -r dir1 dir2</span><span>&nbsp;&mdash; copy directory dir1 to dir2 recursively</span><span><br /> </span><span>mv file1 file2</span><span>&nbsp;&mdash; move (rename) file1 to file2</span><span><br /> </span><span>ln -s file link</span><span>&nbsp;&mdash; create symbolic link to file</span><span><br /> </span><span>touch file</span><span>&nbsp;&mdash; create or update file</span><span><br /> </span><span>cat file</span><span>&nbsp;&mdash; output the contents of file</span><span><br /> </span><span>less file</span><span>&nbsp;&mdash; view file with page navigation</span><span><br /> </span><span>head file</span><span>&nbsp;&mdash; output the first 10 lines of file</span><span><br /> </span><span>tail file</span><span>&nbsp;&mdash; output the last 10 lines of file</span><span><br /> </span><span>tail -f file</span><span>&nbsp;&mdash; output the contents of file as it grows, starting with the last 10 lines</span><span><br /> </span><span>vim file</span><span>&nbsp;&mdash; edit file</span><span><br /> </span><span>alias name 'command'</span><span>&nbsp;&mdash; create an alias for a command</span><span><br /> </span></li>
<li><span></span><span><strong>System</strong></span><span><strong><br /> </strong></span><span>shutdown</span><span>&nbsp;&mdash; shut down machine</span><span><br /> </span><span>reboot</span><span>&nbsp;&mdash; restart machine</span><span><br /> </span><span>date</span><span>&nbsp;&mdash; show the current date and time</span><span><br /> </span><span>whoami</span><span>&nbsp;&mdash; who you are logged in as</span><span><br /> </span><span>finger user</span><span>&nbsp;&mdash; display information about user</span><span><br /> </span><span>man command</span><span>&nbsp;&mdash; show the manual for command</span><span><br /> </span><span>df</span><span>&nbsp;&mdash; show disk usage</span><span><br /> </span><span>du</span><span>&nbsp;&mdash; show directory space usage</span><span><br /> </span><span>free</span><span>&nbsp;&mdash; show memory and swap usage</span><span><br /> </span><span>whereis app</span><span>&nbsp;&mdash; show possible locations of app</span><span><br /> </span><span>which app</span><span>&nbsp;&mdash; show which app will be run by default</span><span><br /> </span></li>
<li><span></span><span><strong>Process Management</strong></span><span><strong><br /> </strong></span><span>ps</span><span>&nbsp;&mdash; display your currently active processes</span><span><br /> </span><span>top</span><span>&nbsp;&mdash; display all running processes</span><span><br /> </span><span>kill pid</span><span>&nbsp;&mdash; kill process id pid</span><span><br /> </span><span>kill -9 pid</span><span>&nbsp;&mdash; force kill process id pid</span><span><br /> </span></li>
<li><span></span><span><strong>Permissions</strong></span><span><strong><br /> </strong></span><span>ls -l</span><span>&nbsp;&mdash; list items in current directory and show permissions</span><span><br /> </span><span>chmod ugo file</span><span>&nbsp;&mdash; change permissions of file to ugo - u is the user's permissions, g is the group's permissions, and o is everyone else's permissions. The values of u, g, and o can be any number between 0 and 7.</span><span><br /> </span><span>7</span><span>&nbsp;&mdash; full permissions</span><span><br /> </span><span>6</span><span>&nbsp;&mdash; read and write only</span><span><br /> </span><span>5</span><span>&nbsp;&mdash; read and execute only</span><span><br /> </span><span>4</span><span>&nbsp;&mdash; read only</span><span><br /> </span><span>3</span><span>&nbsp;&mdash; write and execute only</span><span><br /> </span><span>2</span><span>&nbsp;&mdash; write only</span><span><br /> </span><span>1</span><span>&nbsp;&mdash; execute only</span><span><br /> </span><span>0</span><span>&nbsp;&mdash; no permissions</span><span><br /> </span><span>chmod 600 file</span><span>&nbsp;&mdash; you can read and write - good for files</span><span><br /> </span><span>chmod 700 file</span><span>&nbsp;&mdash; you can read, write, and execute - good for scripts</span><span><br /> </span><span>chmod 644 file</span><span>&nbsp;&mdash; you can read and write, and everyone else can only read - good for web pages</span><span><br /> </span><span>chmod 755 file</span><span>&nbsp;&mdash; you can read, write, and execute, and everyone else can read and execute - good for programs that you want to share</span><span><br /> </span></li>
<li><span></span><span><strong>Networking</strong></span><span><strong><br /> </strong></span><span>wget file</span><span>&nbsp;&mdash; download a file</span><span><br /> </span><span>curl file</span><span>&nbsp;&mdash; download a file</span><span><br /> </span><span>scp user@host:file dir</span><span>&nbsp;&mdash; secure copy a file from remote server to the dir directory on your machine</span><span><br /> </span><span>scp file user@host:dir</span><span>&nbsp;&mdash; secure copy a file from your machine to the dir directory on a remote server</span><span><br /> </span><span>scp -r user@host:dir dir</span><span>&nbsp;&mdash; secure copy the directory dir from remote server to the directory dir on your machine</span><span><br /> </span><span>ssh user@host</span><span>&nbsp;&mdash; connect to host as user</span><span><br /> </span><span>ssh -p port user@host</span><span>&nbsp;&mdash; connect to host on port as user</span><span><br /> </span><span>ssh-copy-id user@host</span><span>&nbsp;&mdash; add your key to host for user to enable a keyed or passwordless login</span><span><br /> </span><span>ping host</span><span>&nbsp;&mdash; ping host and output results</span><span><br /> </span><span>whois domain</span><span>&nbsp;&mdash; get information for domain</span><span><br /> </span><span>dig domain</span><span>&nbsp;&mdash; get DNS information for domain</span><span><br /> </span><span>dig -x host</span><span>&nbsp;&mdash; reverse lookup host</span><span><br /> </span><span>lsof -i tcp:1337</span><span>&nbsp;&mdash; list all processes running on port 1337</span><span><br /> </span></li>
<li><span></span><span><strong>Searching</strong></span><span><strong><br /> </strong></span><span>grep pattern files</span><span>&nbsp;&mdash; search for pattern in files</span><span><br /> </span><span>grep -r pattern dir</span><span>&nbsp;&mdash; search recursively for pattern in dir</span><span><br /> </span><span>grep -rn pattern dir</span><span>&nbsp;&mdash; search recursively for pattern in dir and show the line number found</span><span><br /> </span><span>grep -r pattern dir --include='*.ext</span><span>&nbsp;&mdash; search recursively for pattern in dir and only search in files with .ext extension</span><span><br /> </span><span>command | grep pattern</span><span>&nbsp;&mdash; search for pattern in the output of command</span><span><br /> </span><span>find file</span><span>&nbsp;&mdash; find all instances of file in real system</span><span><br /> </span><span>locate file</span><span>&nbsp;&mdash; find all instances of file using indexed database built from the updatedb command. Much faster than find</span><span><br /> </span><span>sed -i 's/day/night/g' file</span><span>&nbsp;&mdash; find all occurrences of day in a file and replace them with night - s means substitude and g means global - sed also supports regular expressions</span><span><br /> </span></li>
<li><span></span><span><strong>Compression</strong></span><span><strong><br /> </strong></span><span>tar cf file.tar files</span><span>&nbsp;&mdash; create a tar named file.tar containing files</span><span><br /> </span><span>tar xf file.tar</span><span>&nbsp;&mdash; extract the files from file.tar</span><span><br /> </span><span>tar czf file.tar.gz files</span><span>&nbsp;&mdash; create a tar with Gzip compression</span><span><br /> </span><span>tar xzf file.tar.gz</span><span>&nbsp;&mdash; extract a tar using Gzip</span><span><br /> </span><span>gzip file</span><span>&nbsp;&mdash; compresses file and renames it to file.gz</span><span><br /> </span><span>gzip -d file.gz</span><span>&nbsp;&mdash; decompresses file.gz back to file</span><span><br /> </span></li>
<li><span></span><span><strong>Shortcuts</strong></span><span><strong><br /> </strong></span><span>ctrl+a</span><span>&nbsp;&mdash; move cursor to beginning of line</span><span><br /> </span><span>ctrl+f</span><span>&nbsp;&mdash; move cursor to end of line</span><span><br /> </span><span>alt+f</span><span>&nbsp;&mdash; move cursor forward 1 word</span><span><br /> </span><span>alt+b</span><span>&nbsp;&mdash; move cursor backward 1 word</span><span><br /> </span></li>
<li></li>
</ul>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/2425/phd-fellowship-computational-biologybioinformatics-cork-ireland-cork-ireland</guid>
  <pubDate>Thu, 15 Aug 2013 14:09:00 -0500</pubDate>
  <link></link>
  <title><![CDATA[Ph.D. Fellowship (Computational Biology/Bioinformatics) : Cork, Ireland : Cork, Ireland]]></title>
  <description><![CDATA[
<p>Ph.D. Fellowship (18,000 euro/pa, plus tuition fees at the EU students rate) is available for four years to work on development of Bioinformatics resources for the analysis and visualization of ribosome profiling data. Ribosome profiling (ribo-seq) is a technology that allows mapping positions of the ribosomes on the whole transcriptome level with a nucleotide precision. The technology allows obtaining high resolution digital snapshots of gene expression in cells. The position is available starting on the 1st of October, 2013.</p>

<p>Candidate:<br />The candidate is expected to have B.S. or M.S. degree in the disciplines such as Computer Science, Statistics, Applied Mathematics, Physics or Electrical Engineering. The candidates with the backgrounds in Life Science disciplines such as Bioinformatics, Computational or Quantitative Biology will also be considered.</p>

<p>Location:<br />The position is available at LAPTI (http://lapti.ucc.ie) that is located in the Western Gate Building (http://www.stwarchitects.com/project-information.php?c=1&amp;p=09993) at University College Cork. Western Gate Building Research Complex hosts several UCC departments and provides ideal environment for interdisciplinary research. Cork (sometimes referenced as “Venice of Ireland”) is the second most populous city in the Republic. It has friendly cosmopolitan atmosphere and vibrant culture. A number of American industrial giants such as Apple , EMC and Pfizer have chosen Cork as a home for their European headquarters.</p>

<p>Application process:<br />The details of the application process are given at http://lapti.ucc.ie/jobs.html. To ensure prompt processing of your application use the subject line: ‘Ph.D. computational’. All applications received prior to August the 1st are guaranteed equal consideration. However, applications at the later dates will also be considered until the position is filled.</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner</guid>
	<pubDate>Tue, 10 Apr 2018 04:13:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36197/bioinformatics-oneliner</link>
	<title><![CDATA[Bioinformatics OneLiner]]></title>
	<description><![CDATA[<p>To remove all line ends (\n) from a Unix text file:</p><pre>sed ':a;N;$!ba;s/\n//g' filename.txt &gt; newfilename_oneline.txt</pre><p>To get average for a column of numbers (here the second column $2):</p><pre>awk '{ sum += $2; n++ } END { if (n &gt; 0) print sum / n; }'</pre><p>To get sequence length for all sequences in a fasta file:</p><pre>awk '/^&gt;/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \<br />filename.fasta</pre><p>To copy (move, rename, etc) files based on their list in a text file:</p><pre>cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done</pre><p>To split bam files into sets with mapped and unmapped reads:</p><pre>samtools view -F4 sample.bam &gt; sample.mapped.sam<br />samtools view -f4 sample.bam &gt; sample.unmapped.sam</pre><p>To gzip all your fastq files using gnu parallel and gzip:</p><pre>parallel gzip ::: *.fastq</pre><p>To gzip all your fastq files using pigz:</p><pre>pigz *.fastq</pre><p>To count all sequences in a fasta file:</p><pre>grep "^&gt;" yourfile.fasta -c</pre><p>To count all sequences in all fasta files in your current directory:</p><pre>for a in *.fasta; do ls $a; grep "^&gt;" -c $a; done</pre><p>To keep only one copy of duplicated lines:</p><pre>awk '!seen[$0]++'</pre><p>To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:</p><pre>grep "^&gt;" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc</pre><p>To remove everything after the first space at each line, e.g. to to simplify fasta headers:</p><pre>cut -d' ' -f1 &lt; your_file</pre><p>To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):</p><pre>parallel "echo {} &amp;&amp; gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz</pre><p>To count reads in a all .fastq.gz files in your current folder:</p><pre>zcat *.gz | echo $((`wc -l`/4))</pre><p>To count reads in a all .fastq files in your current folder:</p><pre>cat *.fastq | echo $((`wc -l`/4))</pre><p>To count base pairs in a all .fastq.gz files in your current folder:</p><pre>zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c </pre><p>To split multifasta file into many fasta files:</p><pre>awk '/^&gt;/ {OUT=substr($0,2) ".fa"}; {print &gt;&gt; OUT; close(OUT)}' Input_File</pre><p>To convert Illumina FASTQ 1.3 to 1.8:</p><pre>sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&amp;'\''()*+,-.\/0123456789:;&lt;=&gt;?@ABCDEFGHIJ/' f.fastq</pre><p>To convert FASTQ to FASTA:</p><pre>sed -n '1~4s/^@/&gt;/p;2~4p' </pre><p>To get fastq read length distribution:</p><pre>cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c</pre><p>To deinterleave interleaved fastq file:</p><pre>cat myf.fq | paste - - - - - - - - | tee &gt;(cut -f 1-4 | tr "\t" "\n" &gt; myfile_1.fq) | cut -f 5-8 | \<br />tr "\t" "\n" &gt; myf2.fq </pre><p>To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght &gt;= 4000 + coverage &gt;=100):</p><pre>grep "^&gt;" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 &gt;= 4000 &amp;&amp; $6 &gt;= 100) print $0 }' | sort -k 4 -n | \<br />sed s"/ /_/"g</pre><p>To append something to all headers of your fasta files:</p><pre>sed 's/&gt;.*/&amp;YOURSTRING/' filename.fasta &gt; new_filename.fasta</pre><p>To replace/squeeze multiple adjacent spaces by only one space:&nbsp;</p><pre>tr -s " " &lt; file</pre><p>To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.</p><pre>cat your.fastq | paste - - - - | awk 'length($2)&nbsp; &gt;= 21 &amp;&amp; length($2) &lt;= 25' | sed 's/\t/\n/g' &gt; filtered.fastq</pre><p>To print difference between the last and first row in 5th column:</p><pre>awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt</pre><p>To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):</p><pre>awk '/^&gt;/{ seqlen=0; print; next; } seqlen &lt; 200 { if (seqlen + length($0) &gt; 200) $0 = substr($0, 1, 200-seqlen);\<br /> seqlen += length($0); print }' scaffolds.fasta &gt; 200bp_scaffolds.fasta</pre><p>&nbsp;To pipe a compressed fasta file directly into makeblastdb.</p><pre>gunzip -c fasta.gz | makeblastdb -in -</pre><p>To remove sequences with duplicate fasta headers from a fasta file.</p><pre>awk '/^&gt;/{f=!d[$1];d[$1]=1}f' in.fasta &gt; out.fasta</pre>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/2336/3rd-annual-next-generation-sequencing-asia-congress-2013-at-singapore-singapore</guid>
  <pubDate>Wed, 14 Aug 2013 09:55:04 -0500</pubDate>
  <link></link>
  <title><![CDATA[3rd Annual Next Generation Sequencing Asia Congress 2013 at Singapore, Singapore]]></title>
  <description><![CDATA[
<p>The 3rd Annual Next Generation Sequencing Asia Congress is to be held on the 22nd and 23rd of October 2013 in Singapore. Over the 2 days, the conference will provide an overview of the current options of next-generation sequencing platforms, technologies, applications and the newest computational tools for the analysis of next-generation sequencing data and analytical genomics as well as overcoming data management problems. The event will attract over 200 senior-level decision makers working in areas such as next generation sequencing, analytical genomics, computational biology, oncology, RNA profiling, molecular genomics, biomarkers, bioinformatics &amp; data management and clinical &amp; diagnostics development.</p>

<p>Dated : 22 Nov 2013 -23 Nov 2013</p>

<p>http://www.ngsasia-congress.com/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36384/binding-site-prediction-in-protein</guid>
	<pubDate>Wed, 25 Apr 2018 04:35:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36384/binding-site-prediction-in-protein</link>
	<title><![CDATA[Binding Site Prediction in Protein !]]></title>
	<description><![CDATA[<p><span>The interaction between proteins and other molecules is fundamental to all biological functions. In this section we include tools that can assist in prediction of interaction sites on protein surface and tools for predicting the structure of the intermolecular complex formed between two or more molecules (docking).</span></p><h4>Pockets Identification</h4><p><a href="http://sts.bioengr.uic.edu/castp/" target="_blank">CASTp</a></p><div style="text-align: justify;">Automatic Identification of pockets and cavities in proteins structure, and quantitation of their volumes using Delaunay triangulation. Available also as PyMOL plugin</div><p><a href="http://www.bioinformatics.leeds.ac.uk/pocketfinder/" target="_blank">Pocket-Finder</a></p><div style="text-align: justify;">Automatic identification of pockets and cavities in proteins structure, and quantitation of their volumes.</div><p><a href="http://gecco.org.chemie.uni-frankfurt.de/pocketpicker/index.html" target="_blank">PocketPicker</a></p><div style="text-align: justify;">Grid-based technique for the analysis of protein pockets. PocketPicker available as a plugin for&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/pymol.htm">PyMOL</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><h4>Binding Site Prediction</h4>
<p><a href="http://consurf.tau.ac.il/" target="_blank">ConSurf</a></p>
</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Identification of functional regions in proteins by surface-mapping of phylogenetic information</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://www-cryst.bioc.cam.ac.uk/~crescendo/crescendo.php" target="_blank">CRESCENDO</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Identification protein interaction sites. It uses sequence conservation patterns in homologous proteins to distinguish between residues that are conserved due to structural restraints from those due to functional restraints.&nbsp;&nbsp;</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><strong>Ligand Binding Sites</strong></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://www.sbg.bio.ic.ac.uk/~3dligandsite/" target="_blank">3DLigandSite</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">The server utilizes protein-structure prediction to provide structural models of the binding site. Ligands bound to structures are superimposed onto the model and use to predict the binding site.</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">F<a href="http://cssb.biology.gatech.edu/skolnick/files/FINDSITE/" target="_blank">INDSITE</a></div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">A threading-based method for ligand-binding site prediction and functional annotation based on binding-site similarity across superimposed groups of threading templates.</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">
<p><a href="http://scoppi.biotec.tu-dresden.de/pocket/" target="_blank">LIGSITE<sup>csc</sup></a></p>
<div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;">Prediction of binding site by pocket identification using the Connolly surface and degree of conservation</div>
<p><a href="http://metapocket.eml.org/" target="_blank"></a></p>
</div><div style="text-align: justify;">&nbsp;</div><div style="text-align: justify;"><a href="http://metapocket.eml.org/" target="_blank">metaPocket</a>A meta server for ligand-binding site prediction. metaPocket use&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#ligsite">LIGSITE<sup>csc</sup></a>,&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#pass">PASS</a>,&nbsp;<a href="https://bip.weizmann.ac.il/toolbox/structure/binding.htm#qsite">Q-SiteFinder</a>&nbsp;and&nbsp;<a href="http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html" target="_blank">SURFNET</a></div>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/fun/view/2383/golden-rules-of-bioinformatics</guid>
	<pubDate>Wed, 14 Aug 2013 21:11:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/fun/view/2383/golden-rules-of-bioinformatics</link>
	<title><![CDATA[Golden Rules of Bioinformatics]]></title>
	<description><![CDATA[<ol>
<li>All constant are variable.</li>
<li>Copy and paste is a genetic error.</li>
<li>First solve the problem, then write the code.</li>
<li>No matter what goes wrong, it will probably look right.</li>
<li>Any simple problem can be insoluble if enough metting are held to discuss it. :P</li>
<li>Stastics is a systematic method of comming to the wrong conclusion with confidence.</li>
<li>Bug is a undocumented feature in programming languages.</li>
<li>Good biological programmer goes on summer holiday with raincoat. [because see 1]</li>
<li>Thanks god Google know python is not a python and multiplication and division are the same thing.</li>
<li>Don' be clever, complex biology will trick you.</li>
</ol>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/2461/taverna-workflow-management-system</guid>
	<pubDate>Thu, 15 Aug 2013 19:34:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/2461/taverna-workflow-management-system</link>
	<title><![CDATA[Taverna Workflow Management System]]></title>
	<description><![CDATA[<p>Taverna is an open source domain independent Workflow Management System &ndash; a suite of tools used to design and execute scientific workflows. Taverna has been created by the myGrid project and is funded through a range of organisations and projects.</p>
<p>The Taverna suite is written in Java and includes the Taverna Engine(used for enacting workflows) that powers both the Taverna Workbench(the desktop client application) and the Taverna Server (which allows remote execution of workflows). Taverna is also available as a Command Line Tool for a quick execution of workflows from a terminal.</p><p>Address of the bookmark: <a href="http://www.taverna.org.uk/" rel="nofollow">http://www.taverna.org.uk/</a></p>]]></description>
	<dc:creator>Madhvan Reddy</dc:creator>
</item>

</channel>
</rss>