BOL: Related items

Ruby Language

Jitendra Narayan — Mon, 15 Jul 2013 01:34:26 -0500

Ruby was created by Yukihiro Matsumoto, who wished to create a new language that balanced functional programming with imperative programming

Ruby is a dynamic, reflective, general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was initially developed and designed by Yukihiro "Matz" Matsumoto. It was influenced primarily by Perl, Smalltalk, Eiffel, and Lisp.

Ruby supports multiple programming paradigms, including functional, object oriented, imperative and reflective. It also has a dynamic typesystem and automatic memory management; it is therefore similar in varying respects to Python, Perl, Lisp, Dylan, Pike, and CLU.

The standard 1.8.7 implementation is written in C, as a single-pass interpreted language. There is currently no specification of the Ruby language, so the original implementation is considered to be the de facto reference. As of 2010, there are a number of complete or upcoming alternative implementations of the Ruby language, including YARV, JRuby, Rubinius, IronRuby, MacRuby and HotRuby, each of which takes a different approach, with IronRuby, JRuby and MacRuby providing just-in-time compilation and MacRuby also providing ahead-of-time compilation. The official 1.9 branch uses YARV, as will 2.0 (development), and will eventually supersede the slower Ruby MRI.

Ruby Quick Reference
http://www.zenspider.com/Languages/Ruby/QuickRef.html

Ruby Annotation
http://www.w3.org/TR/ruby/

Ruby in Linux Journals
http://www.linuxjournal.com/article/5915

Ruby Documentation: Programming Ruby
http://ruby-doc.org/docs/ProgrammingRuby/

The Top 10 Reasons The Ruby Programming Language Sucks

http://www.slideshare.net/vishnu/the-top-10-reasons-the-ruby-programming-language-sucks

Ruby : The Programmers best friends
http://www.ruby-lang.org/en/

For Ruby Beginners
http://www.squidoo.com/ruby-programming-beginner

Ruby Programming
http://en.wikibooks.org/wiki/Ruby_Programming

Ruby CookBook
http://en.wikibooks.org/wiki/Cookbook:Table_of_Contents

Ruby Programming Challenge for Newbies -
http://rubylearning.com/blog/ruby-programming-challenge-faq/

Common "issues" faced by Ruby Newbies by Chris Strom -
http://japhr.blogspot.com/2009/10/newbie-feedback.html

Books
http://www.sapphiresteel.com/The-Book-Of-Ruby

Free Online Ruby Programming along with many Ruby newbies here -
http://rubylearning.org/class/

Coding Ground

Jitendra Narayan — Tue, 17 Mar 2015 00:47:20 -0500

Online coding group for most of the programming languages.

Code in almost all popular languages using Coding Ground. Edit, compile, execute and share your projects, 100% cloud.

http://www.tutorialspoint.com/codingground.htm

Address of the bookmark: http://www.tutorialspoint.com/codingground.htm

Parallel Processing with Perl !

Rahul Nayak — Sat, 25 Aug 2018 11:32:40 -0500

Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.

Many times in bioinformatics we need to deal with huge datasets which are more than 100GB size. The traditional way to analysis a file is using the while loop

while (FILE){

Do something;

}

This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?

Here is a very simple and efficient technique with perl which i have been using. I am more inclined towards using perl fork than perl threads.

One of the oldest way to fork is

my $fork = fork();
if($fork){
push (@childs,$fork);
}
elseif($fork==0){
your code here;
exit(0);
}
else{die “Couldnt fork : $!”;}
## wait for the child process to finish
foreach(@childs){
my $tmp=waitid($_,0);
}

what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.

Okie, now if you really do not want to use fork in your code, that’s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).
Simple usage:
use Parallel::ForkManager;
my $max_processors=8;
my $fork= new Parallel::ForkManager($max_processors);
foreach (@dna) {
$fork->start and next; # do the fork
you code here;
$fork->finish; # do the exit in the child process
}
$pm->wait_all_children;

so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.

open (my $QUAL, “myfile.txt”);
flock $QUAL, LOCK_EX or die “cant lock file $!”;
print $QUAL “$output”;
flock $QUAL, LOCK_UN or die “$!”;
close $QUAL;

I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.

Putting it all together, If you have 100GB data you can do this

step 1 : split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)
You can use unix “split” command for this
for example:
my $number_split=int($number_of_entries_in_your_dataset/$max_processors);
my $split_Files=`split -l $number_split “your_file.fasta” “file_name”`;
step2: open you directory comtaining you split files and start Parallel::ForkManager.
For example:
opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory
my $fork= new Parallel::ForkManager($max_processors);
while (my $file = readdir(DIRECTORY)) { ### read the directory
if($file=~/^\./){next;}
print $file,”\n”;
########## Start fork ##########
my $pid= $super_fork->start and next;
Whatever you want to do with the split file ;
analyze my piece of $file;
######### end fork ###############
$super_fork->finish;
}
$super_fork->wait_all_children;

So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?

Note:
You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using “cat” command.(correct me if I am wrong)

Or much simpler way is to use pipes

cat output_dir/* | my_pipe or my_pipe <(file1) final_file;

Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.

Trelliscope: flexibly visualize large, complex data in great detail from within the R statistical programming environment.

Jit — Tue, 21 Jan 2020 04:22:49 -0600

Trelliscope provides a way to flexibly visualize large, complex data in great detail from within the R statistical programming environment. Trelliscope is a component in the DeltaRho environment.

For those familiar with Trellis Display, faceting in ggplot, or the notion of small multiples, Trelliscope provides a scalable way to break a set of data into pieces, apply a plot method to each piece, and then arrange those plots in a grid and interactively sort, filter, and query panels of the display based on metrics of interest. With Trelliscope, we are able to create multipanel displays on data with a very large number of subsets and view them in an interactive and meaningful way.

Address of the bookmark: http://deltarho.org/docs-trelliscope/#introduction