BOL: Related items

TRITEX, a computational pipeline for chromosome-scale assembly of plant genomes

LEGE — Fri, 14 Feb 2025 10:53:48 -0600

This is the documentation of TRITEX, a computational pipeline for chromosome-scale assembly of plant genomes. It was developed in the research group Domestication Genomics at the Leibniz Institute of Plant Genetics and Crop Research (IPK) Gatersleben.

Address of the bookmark: https://tritexassembly.bitbucket.io/

Parallel Processing with Perl !

Rahul Nayak — Sat, 25 Aug 2018 11:32:40 -0500

Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.

Many times in bioinformatics we need to deal with huge datasets which are more than 100GB size. The traditional way to analysis a file is using the while loop

while (FILE){

Do something;

}

This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?

Here is a very simple and efficient technique with perl which i have been using. I am more inclined towards using perl fork than perl threads.

One of the oldest way to fork is

my $fork = fork();
if($fork){
push (@childs,$fork);
}
elseif($fork==0){
your code here;
exit(0);
}
else{die “Couldnt fork : $!”;}
## wait for the child process to finish
foreach(@childs){
my $tmp=waitid($_,0);
}

what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.

Okie, now if you really do not want to use fork in your code, that’s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).
Simple usage:
use Parallel::ForkManager;
my $max_processors=8;
my $fork= new Parallel::ForkManager($max_processors);
foreach (@dna) {
$fork->start and next; # do the fork
you code here;
$fork->finish; # do the exit in the child process
}
$pm->wait_all_children;

so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.

open (my $QUAL, “myfile.txt”);
flock $QUAL, LOCK_EX or die “cant lock file $!”;
print $QUAL “$output”;
flock $QUAL, LOCK_UN or die “$!”;
close $QUAL;

I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.

Putting it all together, If you have 100GB data you can do this

step 1 : split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)
You can use unix “split” command for this
for example:
my $number_split=int($number_of_entries_in_your_dataset/$max_processors);
my $split_Files=`split -l $number_split “your_file.fasta” “file_name”`;
step2: open you directory comtaining you split files and start Parallel::ForkManager.
For example:
opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory
my $fork= new Parallel::ForkManager($max_processors);
while (my $file = readdir(DIRECTORY)) { ### read the directory
if($file=~/^\./){next;}
print $file,”\n”;
########## Start fork ##########
my $pid= $super_fork->start and next;
Whatever you want to do with the split file ;
analyze my piece of $file;
######### end fork ###############
$super_fork->finish;
}
$super_fork->wait_all_children;

So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?

Note:
You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using “cat” command.(correct me if I am wrong)

Or much simpler way is to use pipes

cat output_dir/* | my_pipe or my_pipe <(file1) final_file;

Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.

Senior Bioinformatics Scientist at Elucidata

Tue, 27 Nov 2018 04:05:57 -0600

Key Responsibilities
- Process and analyse metabolomic, transcriptional, genomics, proteomics
and any other kind of biological data.
- Interpret the data in the context of relevant biological literature to generate
actionable insights.
- Communicate the findings from data and literature to biologists and use the
biological insights to derive next steps/analyses.
- Communicate work through blogs, meet-ups, research papers, posters, etc.
- Identify, troubleshoot, and implement improvements to existing pipelines
and algorithms.
- Identify and implement new tools and pipelines to use for different types of
biological data.
- Work in a multi-disciplinary team with biologists, data scientists and data
analysts.
- Help with any other requirements (from database design to generating
prototypes for the product team).

Requirements
- 3-5 years of relevant bioinformatics experience such as public data mining,
processing, analysing and visualising omics data, etc.
- Ph.D., Masters or Bachelors in Bioinformatics, Biotechnology,
Computational Biology, or related field.
- Understanding of molecular biology and biochemistry.
- Comfort and experience with biological research and data.
- Proficient in a programming language used for bioinformatics such as R or
python.
- Excellent communication skills.
- Ability to summarise and simplify complex analyses for a non-technical
audience.
- Strong analytical skills, curiosity and a knack to solve difficult problems.
- Work well in multi-disciplinary teams with people of vastly different
backgrounds.
- Demonstrated success in collaboration and independent work.

More at https://angel.co/elucidata/jobs/460104-senior-bioinformatics-scientist

BINC Exam merged with DBT- BET JRF Exam

Jit — Thu, 21 Feb 2019 09:37:36 -0600

Another breaking news received has been received from the Department of biotechnology – DBT. As per a notification released by DBT, Bioinformatics National Certification (BINC) Exam conducted once per year by DBT has been now merged with DBT- BET JRF Exam.

Also, Bioinformatics Industrial Training Program (BIITP) is merged with the HRD Biotechnology Industrial Training Programme (BITP).

While this comes as a surprise for a lot of participants. We believe this is a good attempt to unify and create a national benchmark for talent. And we appreciate this endeavor from Department of biotechnology.

However, such last-minute announcements can create confusion. Thus candidates are advised to go through the complete notification DBT-BET JRF 2019 via the link below.If you have any kind of doubts, you must contact DBT JRF or Biotecnika for any kind of help & assistance.

Attention:-Bioinformatics Programs (BINC and BIITP)

1. Bioinformatics National Certification (BINC) has been merged with DBT-Junior
Research Fellow (BET Exam)

2. Bioinformatics Industrial Training Program (BIITP) is merged with HRDBiotechnology Industrial Training Programme (BITP).

Students of Bioinformatics, who are interested to apply for Fellowship or Industrial
Training may keep track of the advertisement of DBT-JRF (BET Exam) and BITP
of DBT.

More at http://www.bcil.nic.in/files/Attention_Bioinformatics_Programs_(BINC_and_BIITP).pdf

Bioinformatics web development course

Jit — Wed, 06 Nov 2019 20:42:48 -0600

This web development course, targeted at Biology and Bioinformatics students, aims at teaching from scratch all the skills needed to setup a fully working Linux web server and to develop and deploy web applications for Bioinformatics.

No previous programming knowledge is assumed. By following this tutorial you will learn the fundamental concepts of programming by using scripting languages: variables, types, arrays, cycles, conditional statements, functions, objects, regular expressions, files reading and manipulation et-cetera.

Address of the bookmark: http://www.cellbiol.com/bioinformatics_web_development/introduction/

The Clark Lab

Fri, 07 Feb 2020 13:57:24 -0600

Study the process of Adaptive Evolution, during which species adopt novel traits to overcome challenges. We retrace the evolutionary histories of genomic elements to determine the changes underlying adaptation and to discover previously unknown genetic networks. These discoveries have already led to advances in human health, species conservation, and molecular biology.

More at http://clark.genetics.utah.edu/

Bioinformatics Scientist/Research Software Engineer at University of Dundee Dundee, United Kingdom

Wed, 26 Aug 2020 10:31:25 -0500

We are recruiting for an exceptional individual to join us as a computational scientist, bioinformatician, or (research) software engineer with an interest in interactive data analysis platforms for biology and medicine within our Jalview (www.jalview.org) research software engineering team.

More at https://www.jobs.dundee.ac.uk/fe/tpl_uod01.asp?s=4A515F4E5A565B1A&jobid=104342,2382988671&key=147934117&c=99413415238921&pagestamp=sesxbbuyifokdsfygf

Last date: 30th August 2020

Informal enquiries about this position may be made to Prof. Geoff Barton (gjbarton@dundee.ac.uk) or Dr Jim Procter (jprocter@dundee.ac.uk). To find out more about Jalview research software engineering team please visit www.jalview.org and www.compbio.dundee.ac.uk

Two Faculty Positions at National Taiwan University, Taipei, Taiwan

Thu, 22 Oct 2020 04:53:12 -0500

The Department of Agronomy at National Taiwan University, Taipei, Taiwan,
invites applications for two full-time faculty positions beginning August
1, 2021 at the rank of Assistant Professor, Associate Professor or
Professor in Biometry and Bioinformatics and Plant Breeding and Genetics,
respectively.

A qualified candidate should hold a Ph.D. in a relevant field including
Agronomy, Statistics, Bioinformatics, Plant Breeding, Plant Genetics or
Quantitative Genetics. For the position in Biometry and Bioinformatics, the
applicants capable of teaching fundamental statistics/bioinformatics
courses or with experience in crop science are preferable; for Plant
Breeding and Genetics, the applicants capable of teaching fundamental plant
breeding courses, with experience in crop breeding, or training in
quantitative genetics are preferred.

The application package should include two letters of reference and five
printed copies of the following documents (1) curriculum vitae, (2)
publication list, (3) undergraduate and graduate transcripts if applying
for the Assistant Professorship, (4) a photocopy of the Ph.D. diploma, (5)
teaching plan and course outline or syllabus (6) research proposal, (7) a
cover letter indicating the rank to apply, and one representative original
research article which was published by the applicant being the 1st or
corresponding author in an SCI peer-reviewed journal within 5 years (after
August 1, 2016); a copy of doctoral dissertation can be the representative
article if applying for the Assistant Professorship; (8) reprints of the
selected publications published within 7 years (after August 1, 2014).

The application package should mail to the Chair, Dr. Li-yu Daisy Liu
(lyliu@ntu.edu.tw), in the Department of Agronomy, National Taiwan
University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan, before
December 15, 2020 for full consideration.

Introduction to Bioinformatics and Computational Biology

Jit — Mon, 25 Jan 2021 01:32:30 -0600

This is the course material for STAT115/215 BIO/BST282 at Harvard University.

Xiaole Shirley Liu (lead instructor)
Joshua Starmer
Martin Hemberg
Ting Wang
Feng Yue

Ming Tang
Yang Liu
Jack Kang
Scarlett Ge
Jiazhen Rong
Phillip Nicol
Maartin De Vries

We thank many colleagues in the community, who helped Dr. Liu in prepare the STAT115/215 BIO/BST282 course over the years.

Address of the bookmark: https://liulab-dfci.github.io/bioinfo-combio/

Bioinformatics in Africa: Part2 - Kenya

BioStar — Sat, 06 Feb 2021 13:23:54 -0600

International Livestock Research Institute (ILRI):

Under a NEPAD initiative, the Biosciences Eastern and Central Africa (BECA) (www.biosciencesafrica.org) was established at ILRI. BECA consists of a hub, regional nodes, and other affiliated laboratories and partner institutes. A state of the art joint Bioinformatics Platform (www.becabioinfo.org), whose overall goal is to provide a coherent and powerful bioinformatics infrastructure for use by all scientists in East and central Africa. The Platform goal requires both physical and intellectual developments that together provide researchers with access to diverse infrastructure in a widearea network, thereby addressing four important aspects of bioinformatics:

1) Science: bioinformatics tools for data integration and visualization, standardization of data formats and data analysis strategies, and distribution of analysis tasks over local and widearea networks are in development;

2) Bioinformatics Support Facility: provides assistance and custom programming to projects and those unable to establish a bioinformatics support function intrinsic to their project due to shortage of qualified personnel or lack of funding;

3) Hardware Platform: provide a powerful high performance computing platform capable of handling the largest analysis needs for projects;

4) Bioinformatics Training for East and central African scientists: While many Webbased tools are available to the wetlab researcher, the Web is not well suited for tasks beyond singlesequence annotation. Researchers need to become productive in a serverbased Unix environment with its wealth of scripting and automation tools. Even at an entrylevel, this can be an intimidating task if proper guidance is not available.

International Centre of Insect Physiology and Ecology (ICIPE): ICIPE’s research focus is on insect biology, in order to improve the wellbeing of the peoples of the tropics through insect science. There is a commitment to utilise contemporary science in order to limit the impact of disease vectors, and agricultural pests. The understanding of the mechanisms associated with behaviour (e.g. attraction and repellency) is crucial. ICIPE seeks to enhance its bioinformatics capacity in order to support data from various EST projects designed to gain insights into the insect ecology and plant pathogen interactions though studies of metabolic pathways associated with production of all elochemicals.

Longterm training activities:

Kenyatta University: An introductory course in Bioinformatics is offers to MSc Biotechnology students. This comprises of 35 hours of lectures and practicals.

University of Nairobi: A centre for Biotechnology and Bioinformatics (CEBIB), which will offer postgraduate training (diplomas, MSc and PhD) in areas of biotechnology and bioinformatics has recently been launched. Other universities in Kenya, including Egerton, Maseno and the Jomo Kenyatta University of Agriculture and Technology offer introductory courses to undergraduates in biomedical sciences. In addition, under the BECA platform MSc and PhD fellowships are being made available for Bioinformatics students. ILRI is forging links with Universities in South Africa and the United Kingdom to provide access to courses and training material.

Research Interest and Activities:

The following are the present areas of research interest: 1. EST clustering 2. Genome sequencing and annotation 3. Functional genomics and proteomics (including key tropical pathogens) 4. Structural bioinformatics 5. Development of Bioinformatics Data Management Systems 6. Gene Mining 7. High Throughput Genotyping 8. Microarray data management and analysis 9. Metagenomics 10. Immunoinformatics 11. Hostpathogen interaction 12. High performance computing and grid development 13. Parasite transfection technologies 14. Cell cycle regulation 15. Population genetics 16. Vector genomics 17. Drug, vaccine and diagnostic target discovery