BOL: Related items

BUSCO

Jitendra Narayan — Sun, 07 Feb 2016 16:02:39 -0600

Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

More at http://busco.ezlab.org/

Address of the bookmark: http://busco.ezlab.org/

CrossMap

Jitendra Narayan — Mon, 08 Feb 2016 15:47:00 -0600

CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) <> hg19 (GRCh37), Mouse mm9 (MGSCv37) <> mm10 (GRCm38)).

It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.

CrossMap is designed to liftover genome coordinates between assemblies. It’s not a program for aligning sequences to reference genome.

We do not recommend using CrossMap to convert genome coordinates between species.

More at http://crossmap.sourceforge.net/

Address of the bookmark: http://crossmap.sourceforge.net/

Summer 2016

Sun, 21 Feb 2016 06:17:55 -0600

REU at Fordham University- Summer 2016

An NSF-funded REU to study Y-chromosome diversity and sex-biased dispersal in wild brown rats (Rattus norvegicus) is available in the Munshi-South Lab at Fordham University. Our lab is currently investigating rat evolution at scales ranging from landscape genetics of individual cities to global patterns of diversity. Development of resources for investigating Y-chromosome diversity will support many of these studies. The REU student will work with the lab to bioinformatically identify Y-chromosome SNPs, design SNPtype assays,
extract DNA, genotype samples, and analyze data.

We seek applicants interested in bioinformatics, evolutionary biology, and related disciplines. Applicants must have taken a college-level genetics course. This REU will require attention to detail, reliability, independence, and critical thinking.

This position is based at Fordham University's field station, the Louis Calder Center, in Armonk, NY. The Calder Center is located approximately 25 miles north of New York City in a protected woodland area. Housing
will be provided at the Calder Center for the duration of the REU (May 23 to Aug 12, 2016). Additionally, the student will receive a $6,000 stipend. The selected student will participate in professional development activities through the Calder Centers REU program, including presentation of results at a research colloquium at the end of the summer.

To apply, please send a one page personal statement about your scientific interests and how this REU will support your professional goals, unofficial transcripts including a list of Spring 2016 courses, and names of two professional references (including title, address, phone number, and email address) as a single pdf (with your last name in the file name) to Dr. Jason Munshi-South (jmunshisouth@fordham.edu).

Applications are due March 4th, 2016.

Jason Munshi-South

BreakSeq2

Jitendra Narayan — Mon, 29 Feb 2016 17:45:38 -0600

Ultrafast and accurate nucleotide-resolution analysis of structural variants

More at http://bioinform.github.io/breakseq2/

Download BreakSeq2

Latest version: https://github.com/bioinform/breakseq2/archive/2.2.tar.gz

For other versions, see "releases". https://github.com/bioinform/breakseq2/releases

Address of the bookmark: http://bioinform.github.io/breakseq2/

PhD at INSTITUTE OF LIFE SCIENCES, Bhubaneswar

Mon, 30 May 2016 03:36:04 -0500

INSTITUTE OF LIFE SCIENCES

Bhubaneswar 751023

Advt No. 07/2016

Institute of Life Sciences (ILS), Bhubaneswar, an autonomous Institute of the Department of Biotechnology, Ministry of Science & Technology, Government of India engaged in advanced research invites applications from Indian nationals for the Ph.D. program. The main focus of the projects will be computational biology in the following areas.

S. No. Area of Research Principal investigator

1. Computational Cancer Biology Dr. Anshuman Dixit

2. Immunogenomics & Systems Biology Dr. Sunil Kumar Raghav

3. Chromatin remodeling and hematopoiesis Dr. Punit Prasad

Candidates are strongly encouraged to visit ILS webpage for detailed information, regarding the research activities of the above mentioned scientists.

Essential Qualifications:

(a) Eligibility: M.Sc., M.V.Sc., M.Pharm., M.S. Pharma. (with NET/GATE/GPAT/BINC/any other equivalent national level exam) or M.Tech with minimum of 60% marks (or equivalent grade point). Those awaiting final result may also apply.

Applications received after the last date will not be accepted. The envelope should clearly be superscribed with “Application for Ph.D. program (computational biology)”. Short-listed candidates selected for the interview will be published in the Institute website (www.ils.res.in).

Application Fees: Applicants except SC/ST candidates are required to send a non-refundable D.D. for Rs.100/- in favour of “Director, Institute of Life Sciences, Bhubaneswar” payable at Bhubaneswar along with duly filled-in application form by the date mentioned below. Director, ILS reserves the right to withdraw the procedure without assigning any reasons thereof.

Important dates: 

Last date of receiving applications: 24th June 2016 

Date of display of short-listed candidates and instructions on the Institute website: 30th June 2016 

Date of interview: The interview will be organized on 25th July 2016

Advertisement: https://www.ils.res.in/wp-content/uploads/2016/05/advt07-16.pdf

Awesome bioinformatics pipelines !

Jitendra Prajapati — Wed, 30 Mar 2016 21:50:41 -0500

A curated list of awesome pipeline toolkits ...

https://github.com/pditommaso/awesome-pipeline

Address of the bookmark: https://github.com/pditommaso/awesome-pipeline

IBL laboratory

Mon, 12 Aug 2013 02:02:29 -0500

The IBL laboratory focuses on the multi-disciplinary analyses of the global responses of model microorganisms, cyanobacteria (mainly Synechocystis PCC6803) and yeasts (mainly Saccharomyces cerevisae) to environmental stresses triggered by oxidative agents, heavy metals, or drastic changes in nutrients availability. The genome-wide responses studied with the "omics" techniques (transcriptomics, proteomics, metabolomics and genetics) generate a wealth of experimental data, which are processed, archived, integrated and represented as working models through bioinformatics and mathematics.

Link : http://www-dsv.cea.fr/en/instituts/institut-de-biologie-et-de-technologies-de-saclay-ibitec-s/unites-de-recherche/service-de-biologie-integrative-et-genetique-moleculaire-sbigem/laboratoire-de-biologie-integrative-lbi/presentation__1

Computational Proteomics : Lets remember the basics

Jitendra Narayan — Thu, 01 Aug 2013 17:24:20 -0500

I spend some of my valuable time in computational drug designing sector. I remember my initial proteomics days, playing with interactive protein visualization software and dreaming big. Fortunately or unfortunately, I switched to genomics and handling the genomic floods in Petabytes which is expected to be in Brontobytes in coming years. Did I mention Brontobytes ??? Let me call to my server personnel … it gonna tsunami !!!!!

Today, refreshing my old memories I decided to blog about the basic knowledge of biochemistry and computational proteomics skills, but after I found several article on internet saying exactly what I had wanted to say I thought I might as well just redirect BOL's blog readers there instead:

Here is the list of website and videos links which provide a good resource for you basic chemistry need:

http://tecreativ.blogspot.co.uk/2012/09/funny-shortcut-remember-periodic-table.html

This blog have some specific hindi word to remember entire periodic table. I really like

Group 14 (C Si Ge Sn Pb) -> Sentence “Chemistry Sir Gives Sanki Problems”

Sanki is a hindi word which mean crazy :P

I found this link useful as well http://www.wikihow.com/Memorise-the-Periodic-Table

The eagle genomics group provide an element of bioinformatics in periodic tables. Yes you got it, this is not periodic table rather bioinformatics tools with periodicals

http://elements.eaglegenomics.com/

You can also try this video links, which provide you an overview with tricks on periodic tables:

http://www.youtube.com/watch?v=fLSfgNxoVGk

http://www.youtube.com/user/periodicvideos

For drug design educational material, software, tools, databses, viewer, file format and many more stuff at one place http://www.allfordrugs.com/drug-design/ I highly recommend you all computational drug designer to bookmark this page for future studies as well.

I just remember one of my mini project in which I use my flash knowledge (flash .. oh ya flash) to explain amino acids in interactive and user friendly manner. I can’t provide It right now, but promise you to provide a link in near future. I hope that you will enjoy my flashy creative skills :).

Moreover, I found some of very interesting tricks to remember all amino acids chemical formulae on youtube at

http://www.youtube.com/watch?v=gqrWb0fmzQ&list=PL6132651E70BB5575

http://www.youtube.com/watch?v=C2GfoGXfySQ&list=PL6132651E70BB5575

Key points for computer added drug designers?
1. A shortage of biochemistry skills means that you absolutely nowhere in understanding the key concept and do research.
2. Keep handy with complex mathematical formula, before merely running tools or software.
3. Dig it better and deeper guys .. design it.

IMTECH Lab

Sun, 15 Sep 2013 09:41:04 -0500

Computer Aided Protein Structure Prediction; Identification of Vaccine
Candidates (T-Epitope prediction); Analysis of Nucleotide/Protein Sequences; Development of Web Server/

Software; Creation of Public Domain Resources in Biology
Present Status::

Developing prediction methods for gene, beta-turn, secondary structure and MHC-binding sites.
Area of Interest ::

Comparison of force field simulations. Analysis of DNA-protein interactions using molecular mechanics methods.Drug Target Identification using in silico biology.

More @ http://www.imtech.res.in/bic/index.php?option=com_content&view=article&id=65

PIs: http://www.imtech.res.in/bic/index.php?option=com_content&view=article&id=69

Parallel Processing with Perl !

Rahul Nayak — Sat, 25 Aug 2018 11:32:40 -0500

Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.

Many times in bioinformatics we need to deal with huge datasets which are more than 100GB size. The traditional way to analysis a file is using the while loop

while (FILE){

Do something;

}

This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?

Here is a very simple and efficient technique with perl which i have been using. I am more inclined towards using perl fork than perl threads.

One of the oldest way to fork is

my $fork = fork();
if($fork){
push (@childs,$fork);
}
elseif($fork==0){
your code here;
exit(0);
}
else{die “Couldnt fork : $!”;}
## wait for the child process to finish
foreach(@childs){
my $tmp=waitid($_,0);
}

what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.

Okie, now if you really do not want to use fork in your code, that’s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).
Simple usage:
use Parallel::ForkManager;
my $max_processors=8;
my $fork= new Parallel::ForkManager($max_processors);
foreach (@dna) {
$fork->start and next; # do the fork
you code here;
$fork->finish; # do the exit in the child process
}
$pm->wait_all_children;

so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.

open (my $QUAL, “myfile.txt”);
flock $QUAL, LOCK_EX or die “cant lock file $!”;
print $QUAL “$output”;
flock $QUAL, LOCK_UN or die “$!”;
close $QUAL;

I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.

Putting it all together, If you have 100GB data you can do this

step 1 : split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)
You can use unix “split” command for this
for example:
my $number_split=int($number_of_entries_in_your_dataset/$max_processors);
my $split_Files=`split -l $number_split “your_file.fasta” “file_name”`;
step2: open you directory comtaining you split files and start Parallel::ForkManager.
For example:
opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory
my $fork= new Parallel::ForkManager($max_processors);
while (my $file = readdir(DIRECTORY)) { ### read the directory
if($file=~/^\./){next;}
print $file,”\n”;
########## Start fork ##########
my $pid= $super_fork->start and next;
Whatever you want to do with the split file ;
analyze my piece of $file;
######### end fork ###############
$super_fork->finish;
}
$super_fork->wait_all_children;

So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?

Note:
You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using “cat” command.(correct me if I am wrong)

Or much simpler way is to use pipes

cat output_dir/* | my_pipe or my_pipe <(file1) final_file;

Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.