BOL: Related items

Bioinformatics Web Application Development with Perl

Jit — Tue, 26 Dec 2017 18:14:11 -0600

Perl's second wave of adoption came from the growth of the world wide web. Dynamic web pages—the precursor to modern web applications—were easy to create with Perl and CGI. Thanks to Perl's ubiquity as a language for system administrators and its power to manipulate text, it was the default choice for web programming. Its presence everywhere made it popular and, in some ways, the duct tape of the Internet.

Web Application Development

The old days of CGI programs and the simple development style that represented seem clunky. Web pages have become web applications. Development has moved from generating static HTML to both client and server side programming, with rich client interfaces and powerful backends.

Perl is still well suited for developing modern web apps. The language grows more powerful and easier to use every year, the available libraries are wonderful and keep getting better, and the inventions and discoveries available in modern Perl are unsurpassed.

In particular, a modern Perl developer can do amazing things with modern Perl tools. If you still think of Perl web development as a cgi-bin directory full of messy scripts that spew warnings to STDERR, you're a decade out of date. Better yet, you can replace that mess piecemeal, thanks to the new tools and techniques of modern Perl. See, for example, the ever-growing list of technologies Built in Perl.

Modern Perl Web Frameworks

While the old wave of web development may have made the CGI.pm module central, modern Perl web programming follows a stricter separation of business logic, URL and request routing, and output. The days of slinging a string here, an array there, a Perl hash yonder, declaring every variable at the top of the program, and maybe making a subroutine are gone. The Perl world has seen the value of abstraction and ways to mechanize away boilerplate. Perl has dozens of frameworks and toolkits designed to make web development and deployment simpler.

Any of a dozen of these frameworks will help you do great things, but three in particular stand out. You can build web sites and web applications of tremendous value with all three. These are neither the only good possibilities (think of POE or Jifty or Continuity or...) nor the only mechanisms for web programming with Perl (see Mechanize or LWP or Mojo::UserAgent for more). Yet if you want three good options to choose between, start here.

Catalyst

The Catalyst framework is a flexible and powerful system for building small to large web apps. It uses the Moose object system to provide great APIs for extension and further development. It's the most mature of the modern top Perl web frameworks, yet it retains its flexibility and vibrancy. In particular, its plugin and extension ecosystem allows it to evolve to provide new and essential features.

Catalyst has embraced the Plack/PSGI standard for Perl web deployment and recent versions are exploring high-scalability, event-based request handling models.

Dancer

The Dancer framework is deliberately minimal in syntax and scope, but it also has a vibrant plugin ecosystem. Dancer particularly excels for smaller sites and applications, though good programmers can build larger things with it.

The first version of Dancer was easy to use. Dancer 2 continues that ease while improving the internals and robustness of applications.

Mojolicious

The Mojolicious (Mojo) framework has a real-time design based on high performance event handling. Its focus is solving new and interesting problems in simple and effective ways, and the project has produced a lot of new code that does old things in better ways.

In particular, Mojolicious goes to great lengths to support new web standards, such as CSS 3, web sockets, and HTTP 2.

Where Catalyst embraces the CPAN fully, Mojolicious by design provides most of what an average app might need in a single download. It's still fully compatible with the CPAN, but the intention is to provide good working defaults in a package that's easy to start with. Mojo's fans are quick to praise it as fun to develop.

A modern Perl web developer should be familiar with at least one of these frameworks.

Modern Perl Storage Mechanisms

Perl's venerable DBI module has been the focal point of database access since its invention. Its design allows it to provide the same interface to huge relational databases and flat files alike through its DBD extension mechanism. Yet the DBI by itself isn't the be-all, end-all of data storage and access in Perl.

DBIx::Class

DBIx::Class sits on top of DBI to provide an API to your database based on the concept of queries and results. This is often sufficient to remove all but the most complicated of SQL from your code, leaving you to manipulate your business models instead of the small details of how a relational database works. The power and maintainability you receive is well the small cost of the learning curve.

Even better, DBIC can manage (and even generate) your database schema for you.

Recent versions of DBIC have demonstrated that a well-written ORM can perform much better than even clever hand-written code. Because it builds on the Perl DBI, it scales everywhere from SQLite to PostgreSQL, MySQL, Oracle, and more.

Rose::DB

The lesser-known but no less powerful Rose::DB::Object builds on Rose::DB to provide an object-relational mapper for Perl. While its high level features most directly compare to those of DBIx::Class, it's often measurably faster.

NoSQL on the CPAN

Of course the CPAN has modules for almost any NoSQL database or job queue or persistence mechanism you could name, and several you have never heard of. Everything you need is a quick CPAN or cpanm away!

Modern Perl Deployment Strategies

In the early days of the web, deploying a Perl web application meant putting one or more .cgi or .pl files in a special directory and hoping that your system administrator had everything configured correctly. The execution model was often slow and cumbersome, and accessing shared resources such as databases was often tricky.

Modern Perl has better choices. While deployment strategies are the source of many arguments, the return on your investment from learning the modern way is impressive.

Plack/PSGI

The PSGI specification (as exemplified by Plack) describes a strategy for building Perl web apps independent of server and with the possibility to share custom processing behaviors.

In other words, it's a standard for writing Perl apps to take advantage of the huge ecosystem of Perl development available on the CPAN without tying yourself to a server like Apache, Apache 2, nginx, or anything else.

Any good modern Perl web framework (including those listed here) supports PSGI. Several deployment mechanisms exist to meet various business needs which also support PSGI. In particular, you can deploy the same application with a local testing server on your own machine as you can to your production server or servers without changing your application at all.

mod_perl

The older but still viable mod_perl Apache httpd module embeds Perl into the web server. This was the first widespread persistence mechanism for Perl web applications themselves and it's still popular to this day, though PSGI compliance is often the choice for new development. (PSGI handlers to use mod_perl as the backend are available.)

Modern Perl developers should familiarize themselves with PSGI and the wealth of available Plack middleware.

Perl Web Development

Of course no discussion of Perl web development would be complete without mentioning the strength of the CPAN. Almost any project will benefit from the wealth of freely available libraries built to solve real problems. These distributions run the gamut from full-blown web frameworks and content management systems to APIs for web services, development tools, testing systems, and interfaces to document formats and external resources.

For example, if you need to write a web service which accepts JSON data and produces Excel spreadsheets, you can glue together a few CPAN distributions and get the job done early. If you need to consume XML from a remote service and emit a PDF, you're in luck.

Perl's prowess as a general purpose programming language as well as its flexibility and power in managing text and gluing systems together make it a wonderful fit for web development. The community's adoption of modern Perl standards such as PSGI and Plack only enhance your power.

Web application development in Perl is still viable, and modern Perl tools and techniques and libraries make it more powerful and pleasant than ever.

Amity University Bioinformatics Summer Program - Kolkata

eliabrodsky — Tue, 11 Jun 2019 21:27:10 -0500

Registrations are now open for the 2019 Summer Bioinformatics Training program at Amity University, Kolkata. The program will focus on introductory topics for life science students. We will review important history, topics and challenges bioinformatics can help address in the context of basic research, discovery and industry.

Read more: https://edu.t-bio.info/amity-university-summer-bioinformatics-program-registrations-are-open/

How to install Perl modules manually, using CPAN command, and other quick ways

Jit — Fri, 12 Jul 2013 07:20:24 -0500

As a bioinformatics programmer, and crunchy data analyser you need to install several perl modules and dependencies. Installing Perl modules manually by resolving all the dependencies is tedious and annoying process. Some of the packages like GD is the real pain.

However, Installing Perl modules using CPAN is a better solution, as it resolves all the dependencies automatically. In this article, let us review how to install Perl modules on Linux ( which is prefereced amonst bioinformatician) using both manual and CPAN method.

When a Perl module is not installed, application will display the following error message. In this example, XML::Parser Perl module is missing.

Can't locate XML/parser.pm in @INC (@INC contains:
/usr/lib/perl5/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/5.10.0
/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.10.0
/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.10.0 /usr/lib/perl5/vendor_perl
/usr/lib/perl5/site_perl/5.10.0 .)

Manual Method of Perl Module Installation

Install Perl Modules Manually

This manual method is very useful when your computer or server is not connected to the Internet.

Download Perl module:
Go to CPAN Search website and search for the module that you wish to download. In this example, let us search, download and install XML::Parser Perl module. I have downloaded the XML-Parser-2.36.tar.gz to /home/download

# cd /home/download
# gzip -d XML-Parser-2.36.tar.gz
# tar xvf XML-Parser-2.36.tar
# cd XML-Parser-2.36

Build the perl module:
Build by running Makefile.PL, remember the case sensitivity, make and make test.

# perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for XML::Parser::Expat
Writing Makefile for XML::Parser
# make
# make test

Install the perl module:
Now your package is ready to install.

# make install

As a newbie it looks pretty simple, and one go. But, luckily this is a very simple one module with no dependencies. Typically, Perl modules will be dependent on several other modules. Just imagine chasing all these dependencies one-by-one, thinking ... oh ye I got it. That will be very painful and annoying task. I recommend the CPAN method of installation as shown below.

Install Perl Modules using CPAN automatically

Logically, you should must have the CPAN perl module installed in your server or computer before you can install any other Perl modules using CPAN. I know you are laughing, "to install a perl module you need another perl module" ;)

Lets verify whether CPAN is already installed:

To install Perl modules using CPAN, make sure the cpan command is working. Following are the error message when CPAN module is not installed.

# cpan
-bash: cpan: command not found

# perl -MCPAN -e shell
Can't locate CPAN.pm in @INC (@INC contains:
/usr/lib/perl5/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/5.10.0
/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.10.0
/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.10.0
/usr/lib/perl5/vendor_perl /usr/lib/perl5/site_perl/5.10.0 .).
BEGIN failed--compilation aborted.

Install the CPAN module using yum:
If CPAN in not installed in your system, you can use "yum" for the rescue. Dont worry biological data cruncher, this is true we are now dependent all these tiny magicians :).

# yum install perl-CPAN

Output of yum install perl-CPAN command:

Loaded plugins: refresh-packagekit
updates-newkey                       | 2.3 kB     00:00
primary.sqlite.bz2                   | 2.4 MB     00:00
Setting up Install Process
Parsing package install arguments

Resolving Dependencies
Transaction Summary
=============================================================================
Install      5 Package(s)
Update       0 Package(s)
Remove       0 Package(s)

Total download size: 1.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/5): perl-ExtUtils-ParseXS-2.18-31.fc9.i386.rpm     | 30 kB     00:00
(2/5): perl-Test-Harness-2.64-31.fc9.i386.rpm         | 70 kB     00:00
(3/5): perl-CPAN-1.9205-31.fc9.i386.rpm               | 217 kB     00:00
(4/5): perl-ExtUtils-MakeMaker-6.36-31.fc9.i386.rpm   | 284 kB     00:00
(5/5): perl-devel-5.10.0-31.fc9.i386.rpm              | 408 kB     00:00

Installing     : perl-ExtUtils-ParseXS                             [1/5]
Installing     : perl-devel                                        [2/5]
Installing     : perl-Test-Harness                                 [3/5]
Installing     : perl-ExtUtils-MakeMaker                           [4/5]
Installing     : perl-CPAN                                         [5/5]

Installed: perl-CPAN.i386 0:1.9205-31.fc9
Dependency Installed:
perl-ExtUtils-MakeMaker.i386 0:6.36-31.fc9
perl-ExtUtils-ParseXS.i386 1:2.18-31.fc9
perl-Test-Harness.i386 0:2.64-31.fc9
perl-devel.i386 4:5.10.0-31.fc9
Complete!

Configure cpan the first time:
Once the CPAN is installed, you need to configure it by executing cpan, you should set some configuration parameters as shown below. I have shown only the important configuration parameters below. Accept all the default values by pressing enter.

Note: Make sure to execute “o conf commit” in the cpan prompt after the configuration to save the settings.

# cpan

Sorry, we have to rerun the configuration dialog for CPAN.pm due
to some missing parameters...

CPAN build and cache directory? [/root/.cpan]
Download target directory? [/root/.cpan/sources]
Directory where the build process takes place? [/root/.cpan/build]

Always commit changes to config variables to disk? [no]
Cache size for build directory (in MB)? [100]
Let the index expire after how many days? [1]

Perform cache scanning (atstart or never)? [atstart]
Cache metadata (yes/no)? [yes]
Policy on building prerequisites (follow, ask or ignore)? [ask]

Parameters for the 'perl Makefile.PL' command? []
Parameters for the 'perl Build.PL' command? []

Your ftp_proxy? []
Your http_proxy? []
Your no_proxy? []
Is it OK to try to connect to the Internet? [yes]

First, pick a nearby continent and country by typing in the number(s)
(1) Africa
(2) Asia
(3) Central America
(4) Europe
(5) North America
(6) Oceania
(7) South America
Select your continent (or several nearby continents) [] 5

(1) Bahamas
(2) Canada
(3) Mexico
(4) United States
Select your country (or several nearby countries) [] 4

(2) ftp://carroll.cac.psu.edu/pub/CPAN/
(3) ftp://cpan-du.viaverio.com/pub/CPAN/
(4) ftp://cpan-sj.viaverio.com/pub/CPAN/
(5) ftp://cpan.calvin.edu/pub/CPAN
(6) ftp://cpan.cs.utah.edu/pub/CPAN/
e.g. '1 4 5' or '7 1-4 8' [] 2-16

cpan[1]> o conf commit
commit: wrote '/usr/lib/perl5/5.10.0/CPAN/Config.pm'

cpan[2]> quit
No history written (no histfile specified).
Lockfile removed.

Install Perl Modules using CPAN

Hey smile please, now you are ready with CPAN and can download modules in one line command.

You can use one of the following method to install a Perl module using cpan:

# perl -MCPAN -e 'install Bundle::BioPerl'

(or)

# cpan
cpan shell -- CPAN exploration and modules installation (v1.9205)
ReadLine support available (maybe install Bundle::CPAN or Bundle::CPANxxl?)

cpan[1]> install "Bundle::BioPerl"

In the example above, CPAN will check for Bundle::BioPerl dependencies and automatically resolves and installs Bundle::BioPerl with all the dependent Perl modules.

Quick Ways

Oh, look at your face.. smily hmm :). This is what your are looking for, a quick and best way to install Perl modules, Bioperl. Following are the the steps to download BioPerl in your server/computer.

# sudo apt-cache search perl BioPerl

Output will be like as follows:

bioperl - Perl tools for computational molecular biology
bioperl-run - BioPerl wrappers: scripts
libbio-perl-perl - BioPerl core perl modules
libbio-perl-run-perl - BioPerl wrappers: modules
libbio-samtools-perl - Perl interface to SamTools library for DNA sequencing
libbiojava-java - Java API to biological data and applications (default version)
libbiojava3-java - Java API to biological data and applications (default version)
python-biopython-sql - Biopython support for the BioSQL database schema
libbtlib-perl - library for basic sequence manipulation

# sudo apt-get install bioperl

If it is installed then flash the following message:

Reading package lists... Done
Building dependency tree
Reading state information... Done
bioperl is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 10 not upgraded.

In it is found not installed in your server or system them install all with dependencies.

You can use the same approach to install all the modules, and packages if required.

Thanks for reading. Best of luck for your research.

Installing Perl GD Module

Jit — Mon, 22 Jul 2013 14:02:01 -0500

In comparative genome analysis work, we usually compare more than two genomes and looks for syntenic regions amongst them. In my research I used Evolution Highway (RH) http://eh-demo.ncsa.uiuc.edu/, which is a collaborative project designed to provide a visual means for simultaneously comparing genomes of multiple amniote species. The tool removes the burden of manually aligning these maps and allows cognitive skills to be used toward something more valuable than preparation and transformation of data. In addition to EH, attractive Circos (http://circos.ca/) is also very popular for this kind of analysis.

The EH is available online, and can be easily access and use, whereas Circos installation is not entirely straightforward. One of the most difficult parts of the installation involves installing the GD library. Since there weren't good instructions for installing this library on the internet I decided to post instructions here in case they are useful to anyone else.

Following are the steps to install GD modules in Mac OS

1. Setup

Create a folder for the files:

$ mkdir -p /SourceCache
$ cd /SourceCache

Get and unpack the required Jpeg-6b and GD libraries:
Download Jpeg-6b (http://code.google.com/p/google-desktop-for-linux-mirror/downloads/detail?name=jpeg-6b.tar.gz&can=2&q)
Download GD (http://search.cpan.org/~lds/GD-2.46/)

Place the "tar.gz" files in "/SourceCache" and double click to unpack.

2. Install libjpeg

Copy the "config.sub" and "config.guess" files to "/SourceCache". Note that your "config.sub" and ""config.guess" files may be in a slightly different location. The commands below show where they were on my machine:

$ cd /SourceCache/jpeg-6b/src
$ cp /usr/share/libtool/config/config.sub .
$ cp /usr/share/libtool/config/config.guess .

Configure libjpeg as follows. Note that this was installed on a 64 bit machine. However, this method may configure it in a 32 bit format. This may not be the best way to configure the installation but it works.

$ .configure --enable-shared
$ make

Check to see if the following directories exist on your machine. Create the missing directories in the following manner:

$ mkdir -p /usr/local/include
$ mkdir -p /usr/local/bin
$ mkdir -p /usr/local/lib
$ mkdir -p /usr/local/man/man1

Finish making and installing libjpeg:

$ make install

3. Install GD

$ cd /SourceCache/GD-2.46/GD/
$ perl Makefile.PL
$ make
$ make test (optional)
$ make html (optional)
$ make install

Other way for Mac OS
The easiest way to get a lot of these is with a program called Fink, which is similar in nature to the CPAN installer, but installs common GNU utilities. Fink is available from <http://sourceforge.net/projects/fink/>.

Follow the instructions for setting up Fink. Once it's installed, you'll want to run the following as root: fink install gd

It will prompt you for a number of dependencies, type 'y' and hit enter to install all of the dependencies. Then watch it work.

To prevent creating conflicts with the software that Apple installs by default, Fink creates its own directory tree at /sw where it installs most of the software that it installs. This means your libraries and headers for libgd will be at /sw/lib and /sw/include instead of /usr/lib and /usr/local/include. Because of these changed locations for the libraries, the Perl GD module will not install directly via CPAN, because it looks for the specific paths instead of getting them from your environment. But there's a way around that :-)

Instead of typing "install GD" at the cpan> prompt, type look GD. This should go through the motions of downloading the latest version of the GD module, then it will open a shell and drop you into the build directory. Apply below patch to the Makefile.PL file (save the patch into a file and use the command patch < patchfile.)

Then, run these commands to finish the installation of the GD module:

perl Makefile.PL
make
make test
make install
And don't forget to run exit to get back to CPAN.

Install on MS Window, using PPM

C:\Documents and Settings\Owner>ppm
PPM interactive shell (2.2.0) - type 'help' for available commands.
PPM> install GD
Install package 'GD?' (y/N): y
Installing package 'GD'...
Downloading http://ppm.ActiveState.com/PPMPackages/5.6plus/MSW. ...
Installing C:\Perl\site\lib\auto\GD\GD.bs
Installing C:\Perl\site\lib\auto\GD\GD.dll
Installing C:\Perl\site\lib\auto\GD\GD.exp
Installing C:\Perl\site\lib\auto\GD\GD.lib
Installing C:\Perl\html\site\lib\GD.html
Installing C:\Perl\site\lib\GD.pm
Installing C:\Perl\site\lib\qd.pl
Installing C:\Perl\site\lib\auto\GD\autosplit.ix
PPM>

If you can't install it from ppm. You can download it:
http://ppm.ActiveState.com/PPMPackages/5.6plus/MSW.

BTW,All Perl 5.6.1 Modules are located at:

http://ppm.ActiveState.com/PPMPackages/5.6plus/MSW.

Install the Perl GD Module on Linux

$ sudo perl -MCPAN -e shell

Since it was the first time I had run this command on this particular machine I had to answer a lot of questions but simply selected the defaults for everything as this usually works for me. Once in the CPAN shell I entered

$ install Bundle::CPAN

and selected all of the defaults again. Once the CPAN bundle had finished installing I tried to install GD::Graph by typing

$ install GD::Graph

but it failed with hundreds of errors – the first of which was

GD.xs:7:16: error: gd.h: No such file or directory

This was fixed with the following apt-get command (in the bash shell)

$ sudo apt-get install libgd2-xpm-dev

back in the CPAN shell I still couldn’t get GD::Graph to build and I guessed this was because of some left over files from the failed build. I don’t know the command to clean things up inside the CPAN shell and am too lazy to read the docs so I simply went into the .cpan/build directory in my home directory and deleted anything that started with GD – eg

$ rm -rf GD-2.35-HC_vkB

$ rm -rf GDGraph-1.44-Evfibe

and so on. Those strings at the end (VkB and so on) look random so they might be different on your machine. Then I went back into the CPAN shell and ran

$ install GD::Graph

There were a few dependencies which the script fetched and installed for me but everything worked smoothly.

Manual and other Perl Module instalation are mentioned in my previous blog @ http://bioinformaticsonline.com/blog/view/710/how-to-install-perl-modules-manually-using-cpan-command-and-other-quick-ways

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

Pattern Matching Problem Solution with Perl

Jit — Tue, 09 Jun 2015 23:58:45 -0500

Problem at http://rosalind.info/problems/1c/

#Find all occurrences of a pattern in a string.
#Given: Strings Pattern and Genome.
#Return: All starting positions in Genome where Pattern appears as a substring. Use 0-based indexing.

use strict;
use warnings;

my $string="GATATATGCATATACTT";
my $subStr="ATAT";
my $kmer=length($subStr);

kmerMatch ($string, $subStr, $kmer);

sub kmerMatch { #Check the exact matching kmers with sliding window
my ($string, $myStr, $kmer)=@_;
for (my $aa=0; $aa<=(length($string)-$kmer); $aa++) {
    my $myWin=substr $string, $aa,$kmer;
    if ($myWin eq $myStr) {
        #print "$myWin eq $myStr\n";
        print $aa;
    }
}
}

BioScripts

Rahul Nayak — Sun, 28 Jun 2015 07:46:14 -0500

You are requested to please bookmark collection of bioinformatics tools, scripts, codes that can be pieced together in a very easy and flexible manner to perform both simple and complex bioinformatics tasks.

The next-generation sequencing included whole genome sequencing(WGS), transcriptome sequencing (whole cDNA sequencing, RNA-seq), digital gene expression sequencing (Tag-Seq), ChIP-Seq, and so on. And there are many sequencing platform to generate sequece, as well know Sanger/ABi(the frist generation), Solexa/illumina, SOLiD/ABi, 454/Roche. But thier sequence format is different, also they have different error type. High quality data is very important for further analysis or data mining. There are many pipeline for raw sequence quality analysis and control with few of process for reporting reads quality statistical details, trimming, filtering, and error correction. Please bookmarks them for the benefits of bioinformatics community.

https://code.google.com/p/biowiki/

https://code.google.com/p/ngs-pipeline/source/browse/#svn%2Ftrunk

NGSand Perl scripts https://code.google.com/hosting/search?q=NGS+perl&projectsearch=Search+projects

NGS and Python scripts https://code.google.com/hosting/search?q=NGS+Python&projectsearch=Search+projects

Address of the bookmark: https://code.google.com/hosting/search?q=bioinformatics&sa=Search

BioToolbox

Jit — Fri, 19 Feb 2016 09:14:44 -0600

This is a collection of libraries and high-quality end-user scripts for bioinformatic analysis, including working with gene annotation, collecting data scores from a variety of modern file formats, and conversion between file formats. The Bio::ToolBox libraries provide a unified, abstracted interface to multiple common gene annotation formats and the collection of data from multiple data files. They rely on BioPerl SeqFeature libraries and related adaptors to access binary file formats including Bam, BigWig, BigBed, and USeq. The Bio::ToolBox package includes scripts for setting up databases of annotation, collecting annotated features, collecting genomic data relative to features, manipulating and analyzing data, and data format conversion.

More at http://cpansearch.perl.org/src/TJPARNELL/

Address of the bookmark: http://cpansearch.perl.org/src/TJPARNELL/

Parallel Processing with Perl !

Rahul Nayak — Sat, 25 Aug 2018 11:32:40 -0500

Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.

Many times in bioinformatics we need to deal with huge datasets which are more than 100GB size. The traditional way to analysis a file is using the while loop

while (FILE){

Do something;

}

This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?

Here is a very simple and efficient technique with perl which i have been using. I am more inclined towards using perl fork than perl threads.

One of the oldest way to fork is

my $fork = fork();
if($fork){
push (@childs,$fork);
}
elseif($fork==0){
your code here;
exit(0);
}
else{die “Couldnt fork : $!”;}
## wait for the child process to finish
foreach(@childs){
my $tmp=waitid($_,0);
}

what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.

Okie, now if you really do not want to use fork in your code, that’s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).
Simple usage:
use Parallel::ForkManager;
my $max_processors=8;
my $fork= new Parallel::ForkManager($max_processors);
foreach (@dna) {
$fork->start and next; # do the fork
you code here;
$fork->finish; # do the exit in the child process
}
$pm->wait_all_children;

so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.

open (my $QUAL, “myfile.txt”);
flock $QUAL, LOCK_EX or die “cant lock file $!”;
print $QUAL “$output”;
flock $QUAL, LOCK_UN or die “$!”;
close $QUAL;

I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.

Putting it all together, If you have 100GB data you can do this

step 1 : split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)
You can use unix “split” command for this
for example:
my $number_split=int($number_of_entries_in_your_dataset/$max_processors);
my $split_Files=`split -l $number_split “your_file.fasta” “file_name”`;
step2: open you directory comtaining you split files and start Parallel::ForkManager.
For example:
opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory
my $fork= new Parallel::ForkManager($max_processors);
while (my $file = readdir(DIRECTORY)) { ### read the directory
if($file=~/^\./){next;}
print $file,”\n”;
########## Start fork ##########
my $pid= $super_fork->start and next;
Whatever you want to do with the split file ;
analyze my piece of $file;
######### end fork ###############
$super_fork->finish;
}
$super_fork->wait_all_children;

So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?

Note:
You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using “cat” command.(correct me if I am wrong)

Or much simpler way is to use pipes

cat output_dir/* | my_pipe or my_pipe <(file1) final_file;

Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.

Bioinformatics 101 - Running BLAST

Tue, 03 Sep 2013 14:59:50 -0500

How to format the database for BLAST, run the command, view the output file, and use BioPerl and Perl to parse the output. By David Francis, Ohio State University. Delivered live at the Tomato Disease Workshop 2010. For more information, please visit http://www.extension.org/pages/32521/bioinformatics-101-video.