BOL: Related items

List of pharmacogenomics companies in India

Jitendra Narayan — Fri, 09 Aug 2013 13:26:56 -0500

pharmacogenomics companies in India are making their good impacts. Here is the list of few pharmacogenomics companies. Please add more if not mentioned here.

Genomics in India
www.ganitlabs.in
www.sandor.co.in
www.igib.res.in
www.genotypic.co.in
www.ocimumbio.com
www.abcgenomics.com
www.xcelrisgenomics.com
www.ayugen.com
www.geneombiotech.com

Parallel Processing with Perl !

Rahul Nayak — Sat, 25 Aug 2018 11:32:40 -0500

Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.

Many times in bioinformatics we need to deal with huge datasets which are more than 100GB size. The traditional way to analysis a file is using the while loop

while (FILE){

Do something;

}

This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?

Here is a very simple and efficient technique with perl which i have been using. I am more inclined towards using perl fork than perl threads.

One of the oldest way to fork is

my $fork = fork();
if($fork){
push (@childs,$fork);
}
elseif($fork==0){
your code here;
exit(0);
}
else{die “Couldnt fork : $!”;}
## wait for the child process to finish
foreach(@childs){
my $tmp=waitid($_,0);
}

what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.

Okie, now if you really do not want to use fork in your code, that’s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).
Simple usage:
use Parallel::ForkManager;
my $max_processors=8;
my $fork= new Parallel::ForkManager($max_processors);
foreach (@dna) {
$fork->start and next; # do the fork
you code here;
$fork->finish; # do the exit in the child process
}
$pm->wait_all_children;

so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.

open (my $QUAL, “myfile.txt”);
flock $QUAL, LOCK_EX or die “cant lock file $!”;
print $QUAL “$output”;
flock $QUAL, LOCK_UN or die “$!”;
close $QUAL;

I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.

Putting it all together, If you have 100GB data you can do this

step 1 : split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)
You can use unix “split” command for this
for example:
my $number_split=int($number_of_entries_in_your_dataset/$max_processors);
my $split_Files=`split -l $number_split “your_file.fasta” “file_name”`;
step2: open you directory comtaining you split files and start Parallel::ForkManager.
For example:
opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory
my $fork= new Parallel::ForkManager($max_processors);
while (my $file = readdir(DIRECTORY)) { ### read the directory
if($file=~/^\./){next;}
print $file,”\n”;
########## Start fork ##########
my $pid= $super_fork->start and next;
Whatever you want to do with the split file ;
analyze my piece of $file;
######### end fork ###############
$super_fork->finish;
}
$super_fork->wait_all_children;

So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?

Note:
You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using “cat” command.(correct me if I am wrong)

Or much simpler way is to use pipes

cat output_dir/* | my_pipe or my_pipe <(file1) final_file;

Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.

The Ontario Institute for Cancer Research (OICR) Genomics Lab , Toronto, Canada.

Mon, 12 Aug 2013 01:43:13 -0500

The Human Genome Project led to the development of a wide array of technologies to screen the genome and its products (genes, proteins, metabolites) and molecules that interact with these products (chemicals, RNAi). The existence of these tools resulted in the creation of facilities that use robotics and informatics to generate high-throughput screens of DNA, RNA, protein, tissue, chemicals and other substances.

The genomics platform uses cancer genome sequencing and other high-throughput techniques to identify genes critical to the development of cancer and anomalies in the genomic profile of the tumours.

For more info visit : http://oicr.on.ca/

Senior Bioinformatics Scientist at Elucidata

Tue, 27 Nov 2018 04:05:57 -0600

Key Responsibilities
- Process and analyse metabolomic, transcriptional, genomics, proteomics
and any other kind of biological data.
- Interpret the data in the context of relevant biological literature to generate
actionable insights.
- Communicate the findings from data and literature to biologists and use the
biological insights to derive next steps/analyses.
- Communicate work through blogs, meet-ups, research papers, posters, etc.
- Identify, troubleshoot, and implement improvements to existing pipelines
and algorithms.
- Identify and implement new tools and pipelines to use for different types of
biological data.
- Work in a multi-disciplinary team with biologists, data scientists and data
analysts.
- Help with any other requirements (from database design to generating
prototypes for the product team).

Requirements
- 3-5 years of relevant bioinformatics experience such as public data mining,
processing, analysing and visualising omics data, etc.
- Ph.D., Masters or Bachelors in Bioinformatics, Biotechnology,
Computational Biology, or related field.
- Understanding of molecular biology and biochemistry.
- Comfort and experience with biological research and data.
- Proficient in a programming language used for bioinformatics such as R or
python.
- Excellent communication skills.
- Ability to summarise and simplify complex analyses for a non-technical
audience.
- Strong analytical skills, curiosity and a knack to solve difficult problems.
- Work well in multi-disciplinary teams with people of vastly different
backgrounds.
- Demonstrated success in collaboration and independent work.

More at https://angel.co/elucidata/jobs/460104-senior-bioinformatics-scientist

Computational Biology in the 21st Century: Making Sense out of Massive Data

Thu, 29 Aug 2013 08:32:26 -0500

Computational Biology in the 21st Century: Making Sense out of Massive Data Air date: Wednesday, February 01, 2012, 3:00:00 PM Category: Wednesday Afternoon Lectures Description: The last two decades have seen an exponential increase in genomic and biomedical data, which will soon outstrip advances in computing power to perform current methods of analysis. Extracting new science from these massive datasets will require not only faster computers; it will require smarter algorithms. We show how ideas from cutting-edge algorithms, including spectral graph theory and modern data structures, can be used to attack challenges in sequencing, medical genomics and biological networks. The NIH Wednesday Afternoon Lecture Series includes weekly scientific talks by some of the top researchers in the biomedical sciences worldwide. Author: Dr. Bonnie Berger Runtime: 00:58:06 Permanent link: http://videocast.nih.gov/launch.asp?17563

Tenure Track position in Bioinformatics at Institute of Neurobiology, UNAM, Querétaro, México

Mon, 10 Jun 2019 00:48:54 -0500

The Institute of Neurobiology UNAM (www.inb.unam.mx) offers a tenure-track position at the level of Assistant Professor (Investigador Asociado C) to develop an original research program in Bioinformatics with applications to neuroscience and to establish multidisciplinary collaboration with other members of the Institute. Applicants are expected to have a doctorate degree, postdoctoral experience related to bioinformatics or genome biology, and a strong track record of peer-reviewed publications. No previous experience in neuroscience is required.

Interested applicants must submit CV and addresses of three references to ataulfo@unam.mx.

Tenure Track position in Genomic Sciences

Laboratorio Internacional de Investigación sobre el Genoma Humano, UNAM Juriquilla, Querétaro, México

The International Laboratory for Human Genome Research, LIIGH-UNAM (www.liigh.unam.mx) offers a tenure-track position at the level of Assistant Professor (Investigador Asociado C) to perform research, teaching and formation of human resources in the area of: “Genomics of Mendelian Diseases”

Applicants are expected to have a doctorate degree, postdoctoral experience related to the above mentioned area and a strong track record of peer-reviewed publications. Interested applicants must submit CV, email addresses of three references, and a three-page project to Dr. Rafael Palacios, Coordinator of LIIGH-UNAM (palacios@liigh.unam.mx) before June 21, 2019 ………………………………………………………………

Tenure Track position in Genomic Sciences

Laboratorio Internacional de Investigación sobre el Genoma Humano, UNAM Juriquilla, Querétaro, México

Applicants are expected to have a doctorate degree, postdoctoral experience related to the above mentioned area and a strong track record of peer-reviewed publications. Interested applicants must submit CV, email addresses of three references, and a three-page statement of research interests to Dr. Rafael Palacios, Coordinator of LIIGH-UNAM (palacios@liigh.unam.mx) before June 21, 2019

3rd Annual Next Generation Sequencing Asia Congress 2013 at Singapore, Singapore

Wed, 14 Aug 2013 09:55:04 -0500

The 3rd Annual Next Generation Sequencing Asia Congress is to be held on the 22nd and 23rd of October 2013 in Singapore. Over the 2 days, the conference will provide an overview of the current options of next-generation sequencing platforms, technologies, applications and the newest computational tools for the analysis of next-generation sequencing data and analytical genomics as well as overcoming data management problems. The event will attract over 200 senior-level decision makers working in areas such as next generation sequencing, analytical genomics, computational biology, oncology, RNA profiling, molecular genomics, biomarkers, bioinformatics & data management and clinical & diagnostics development.

Dated : 22 Nov 2013 -23 Nov 2013

http://www.ngsasia-congress.com/

Industrial Training in Computer Aided Drug Designing (CADD)

RASA Life Sciences — Wed, 13 Nov 2019 05:00:44 -0600

Learn More about Computer Aided Drug Designing (CADD)!!!

#rasalsi #rasa In our Industrial program you will get Knowledge on RNA Seq, CHIP Seq,

Batch Starts From 18^th November 2019

#hurryup #registernow #enquirenow The primary goal of the industrial training program is to provide students with necessary skills making with employable. RASA LSI trains students with the enhanced skills required for them to excel in jobs in biotechnology, pharmaceuticals, BioIT and related industry sectors. At Rasa you will learn from industry leaders how to apply these skills in life science & have a command over software developing process by using various methodologies. For Registration visit us on: https://www.rasalsi.com/index.php/front-page/industrial-training/

Golden Rules of Bioinformatics

Jitendra Narayan — Wed, 14 Aug 2013 21:11:33 -0500

All constant are variable.
Copy and paste is a genetic error.
First solve the problem, then write the code.
No matter what goes wrong, it will probably look right.
Any simple problem can be insoluble if enough metting are held to discuss it. :P
Stastics is a systematic method of comming to the wrong conclusion with confidence.
Bug is a undocumented feature in programming languages.
Good biological programmer goes on summer holiday with raincoat. [because see 1]
Thanks god Google know python is not a python and multiplication and division are the same thing.
Don' be clever, complex biology will trick you.

Post Doc Computational Biology, Bioinformatics - Network Biology & Data Science, NGS (m/f/d)

Sat, 15 Feb 2020 06:13:35 -0600

https://www.jobvector.de/jobs-stellenangebote/biologie-life-sciences/forschung-entwicklung/post-doc-computational-biology-bioinformatics-network-biology-data-science-ngs-129867.html?suid=e522e9793b41817e52ac58d6963b94e2519920df

Requirements
Doctoral degree in Bioinformatics, Computational Biology, (Bio)physics/-mathematics, Biochemistry/Biology or similar with strong quantitative and numeric focus
Ability to numerically process complex and large data sets
Good programming skills (R/Bioconductor and/or Python preferred, Linux is a plus)
Experience in analyzing next-generation sequencing data sets using network biology
Scientific publication record in applied bioinformatics
Familiarity with single cell NGS analyses and other –omics techniques is a plus, but not essential