BOL: Related items

Pattern Matching Problem Solution with Perl

Jit — Tue, 09 Jun 2015 23:58:45 -0500

Problem at http://rosalind.info/problems/1c/

#Find all occurrences of a pattern in a string.
#Given: Strings Pattern and Genome.
#Return: All starting positions in Genome where Pattern appears as a substring. Use 0-based indexing.

use strict;
use warnings;

my $string="GATATATGCATATACTT";
my $subStr="ATAT";
my $kmer=length($subStr);

kmerMatch ($string, $subStr, $kmer);

sub kmerMatch { #Check the exact matching kmers with sliding window
my ($string, $myStr, $kmer)=@_;
for (my $aa=0; $aa<=(length($string)-$kmer); $aa++) {
    my $myWin=substr $string, $aa,$kmer;
    if ($myWin eq $myStr) {
        #print "$myWin eq $myStr\n";
        print $aa;
    }
}
}

BioScripts

Rahul Nayak — Sun, 28 Jun 2015 07:46:14 -0500

You are requested to please bookmark collection of bioinformatics tools, scripts, codes that can be pieced together in a very easy and flexible manner to perform both simple and complex bioinformatics tasks.

The next-generation sequencing included whole genome sequencing(WGS), transcriptome sequencing (whole cDNA sequencing, RNA-seq), digital gene expression sequencing (Tag-Seq), ChIP-Seq, and so on. And there are many sequencing platform to generate sequece, as well know Sanger/ABi(the frist generation), Solexa/illumina, SOLiD/ABi, 454/Roche. But thier sequence format is different, also they have different error type. High quality data is very important for further analysis or data mining. There are many pipeline for raw sequence quality analysis and control with few of process for reporting reads quality statistical details, trimming, filtering, and error correction. Please bookmarks them for the benefits of bioinformatics community.

https://code.google.com/p/biowiki/

https://code.google.com/p/ngs-pipeline/source/browse/#svn%2Ftrunk

NGSand Perl scripts https://code.google.com/hosting/search?q=NGS+perl&projectsearch=Search+projects

NGS and Python scripts https://code.google.com/hosting/search?q=NGS+Python&projectsearch=Search+projects

Address of the bookmark: https://code.google.com/hosting/search?q=bioinformatics&sa=Search

Five unique traits of effective computational biologist

Jitendra Narayan — Thu, 11 Jul 2013 13:12:51 -0500

Bioinformatics research is driven by large set of software, scripts, and tools to analyse gigantic biological data. Being a great biological programmer or bioinformatician involves more than writing code that works. The biological programmers who rise to the top ranks of their profession are not only good programmer but also expert in biological stuff. Moreover, In order to be a good and effective biological programmer, you need to possess a combination of traits that allow your computational as well as biological skill, experience, and knowledge to produce working code. There are some technically skilled biological programmers who will never be effective because they lack the other important traits needed. Here are top five traits that are necessary to become a great biological programmer.

1. Learn and get updated

Some of the bad biological programmers only learn new technical or non-technical things when it’s absolutely necessary. The good biological programmers learn new technical skills proactively. But great biological programmers not only learn new technical skills on their own but also learn non-technical skills, and have an open mind to sources of knowledge that others may shut out.

In other concrete term, the bad biological programmer learn Perl's regular expression when they started a project on comparative genomics; the good biological programmer learned it a year before because it looked interesting; and the great biological programmer also read about the BioPerl packages, genomics, DNA string, genomic theories, or some similar course of study so that they could understand the results and explain it biologically.

2. Not a merely coder!!!

I often encountered with biological programmer who call themself a hard-core computer programmer and avoid biology. I can almost guarantee that if you are one of them then you are not doing research but merely writing "dry" codes.

According to my supervisor most of the computational biologist, don't know what they are doing biologically. Even they struggle to explain their own programs output and results. Therefore, It is highly advisable to learn basic of biology which can assist you to explain the result and understand your discovery. Always remember you are a researcher not a coder.

3. Be Social with biologist

The computational biologist spends most of the time in from of computers, writing codes. They always think their job is to produce working codes, not technical research perfections. But, they are completely wrong. You should not forget that apart from your computational skills you also need some biologist, other than your supervisor, to explain and make you understand the complex biological mechanism.

I highly recommend your to interact with biotech researchers and learn how do they explain their one graph (which they generally produce after one year of work) biologically. Remember, the origin of your research project is complex biological phenomenon, which is more complex than that of your limited programming rules.

4. Do not search, research for answers

Researching for answers means more than typing several keywords into a search engine or posting a question at Stack Overflow or the BioStars forums. I have entered problems into search engines that generate no results, and every question I posted on Stack Overflow or the BioStars forums never got anything resembling an answer, yet I solved the issues and moved on. I’m not a magician — I just know how to find answers or discover root causes.

Many problems are situational, and if you depend on search engines and forums, you can waste a lot of time going down a rabbit hole and possibly never getting a solution. Learn to perform root cause analysis, learn enough about the underlying system to look for other clues and solutions, and learn to take a long distance view of an issue before deep diving into it.

5. Love and defend your research

You cannot rise to the top in this research profession without loving your work. There are some very good “it’s just a job” biological programmers (I’ve been one at times), but if that is your outlook, you won’t be willing to do whatever it takes to succeed. This idea gets a lot of folks in a huff, because they feel it is a personal insult. “I’m a good programmer, but I have other priorities and can’t make work my life.” I understand completely; I have other priorities too. As much as I hate to say it, when I am passionate about my work, I am willing (though not eager) to abandon my other priorities to finish the job. It is not an insult to say that if you aren’t willing to pull out all the stops you can’t be the best, it is a fact.

You must be passionate about more than programming — you must also be excited about your research, the tools and technology you are using, and so on. I have seen very good and even great biological programmers operating at mediocre levels because something was not a good fit, such as they hated the project or were using a technology they disliked. Therefore, like your research project and get excited about your discoveries. You have not only to discover but also defend your finding with scientific words.

Thanks to all of you for reading.

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

RStudio

Jitendra Narayan — Sat, 27 Dec 2014 06:50:58 -0600

RStudio IDE is a powerful and productive user interface for R. It’s free and open source, and works great on Windows, Mac, and Linux.

The developers and expert trainers are the authors of several popular R packages, including ggplot2, plyr, lubridate, and others.

More at http://www.rstudio.com/

http://www.rstudio.com/products/RStudio/

Address of the bookmark: http://www.rstudio.com/

Bioinformatics Scripts

Jit — Thu, 22 Jan 2015 22:29:39 -0600

Some of the useful bioinformatics scripts.

For example ... contig-stats.pl is a Perl script that will automatically describe features of a sequence assembly.

http://milkweedgenome.org/?q=scripts

Address of the bookmark: http://milkweedgenome.org/?q=scripts

Perl One liner basics !!

Abhimanyu Singh — Sun, 24 May 2015 09:28:33 -0500

Perl has a ton of command line switches (see perldoc perlrun), but I'm just going to cover the ones you'll commonly need to debug code. The most important switch is -e, for execute (or maybe "engage" :) ). The -e switch takes a quoted string of Perl code and executes it. For example:

$ perl -e 'print "Hello, World!\n"'
Hello, World!

It's important that you use single-quotes to quote the code for -e. This usually means you can't use single-quotes within the one liner code. If you're using Windows cmd.exe or PowerShell, you must use double-quotes instead.

I'm always forgetting what Perl's predefined special variables do, and often test them at the command line with a one liner to see what they contain. For instance do you remember what $^O is?

$ perl -e 'print "$^O\n"'
linux

It's the operating system name. With that cleared up, let's see what else we can do. If you're using a relatively new Perl (5.10.0 or higher) you can use the -E switch instead of -e. This turns on some of Perl's newer features, like say, which prints a string and appends a newline to it. This saves typing and makes the code cleaner:

$ perl -E 'say "$^O"'
linux

Pretty handy! say is a nifty feature that you'll use again and again.

Frequent words problem solution by Perl

Jit — Tue, 09 Jun 2015 23:38:44 -0500

Solved with perl http://rosalind.info/problems/1a/

#Find the most frequent k-mers in a string.
#Given: A DNA string Text and an integer k.
#Return: All most frequent k-mers in Text (in any order).

use strict;
use warnings;

my $string="ACGTTGCATGTCGCATGATGCATGAGAGCT";
my $kmer=4;
my %myHash;
my $max=0;

for (my $aa=0; $aa<=(length($string)-4); $aa++) {
   my $myStr=substr $string, $aa,$kmer;
   #print "$myStr\n";
   my $km=kmerMatch ($string, $myStr, $kmer);
   if ($km > $max) { $max = $km;}
   #print "$km\t$myStr\n";
   $myHash{$myStr}=$km;

}

#Print all key which have matching values
foreach my $name (keys %myHash){
    print "$name " if $myHash{$name} == $max;
}

sub kmerMatch { #Check the exact matching kmers with sliding window
my ($string, $myStr, $kmer)=@_;
my $count=0;
for (my $aa=0; $aa<=(length($string)-4); $aa++) {
   my $myWin=substr $string, $aa,$kmer;
   if ($myWin eq $myStr) {
       #print "$myWin eq $myStr\n";
       $count++;
   }
}
return $count;
}

Clump Finding Problem Solved with Perl

Jit — Wed, 10 Jun 2015 00:17:17 -0500

The question at http://rosalind.info/problems/1d/

Script are moved to http://bioinformaticsonline.com/snippets/view/34633/clump-finding-problem-solved-with-perl

Reverse Complement Problem Solved with Perl

Jit — Tue, 09 Jun 2015 23:37:23 -0500

Question at http://rosalind.info/problems/1b/

#Find the reverse complement of a DNA string.
#Given: A DNA string Pattern.
#Return: Pattern, the reverse complement of Pattern.

use strict;
use warnings;

my $string="AAAACCCGGT";
my $finalString="";
my %hash = (
   "C" => "G",
   "A" => "T",
   "T" => "A",
   "G" => "C",
);

for (my $aa=0; $aa<=(length($string)-1); $aa++) {
   my $char=substr $string, $aa, 1;
   #print $hash{$char};
   $finalString="$hash{$char}"."$finalString";
}

print $finalString;
print "\n";