BOL: Related items

Prime Minister’s 100k Genome Project

Jitendra Narayan — Thu, 08 Aug 2013 09:40:39 -0500

Genomics Ebgland is destined to sequence 100,000 patients over the next five year in England. A landmark project by british government.

Genomics England will play a key role in building on the UK’s long track record as leader in medical science advances to push the boundaries by unlocking the power of DNA data. The UK will become the first ever country to introduce this technology in its mainstream health system – leading the global race for better tests, better drugs and above all better, more personalised care.

http://www.genomicsengland.co.uk/100k-genome-project/

Bioinformatics OneLiner

Rahul Nayak — Tue, 10 Apr 2018 04:13:03 -0500

To remove all line ends (\n) from a Unix text file:

sed ':a;N;$!ba;s/\n//g' filename.txt > newfilename_oneline.txt

To get average for a column of numbers (here the second column $2):

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

To get sequence length for all sequences in a fasta file:

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \
filename.fasta

To copy (move, rename, etc) files based on their list in a text file:

cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done

To split bam files into sets with mapped and unmapped reads:

samtools view -F4 sample.bam > sample.mapped.sam
samtools view -f4 sample.bam > sample.unmapped.sam

To gzip all your fastq files using gnu parallel and gzip:

parallel gzip ::: *.fastq

To gzip all your fastq files using pigz:

pigz *.fastq

To count all sequences in a fasta file:

grep "^>" yourfile.fasta -c

To count all sequences in all fasta files in your current directory:

for a in *.fasta; do ls $a; grep "^>" -c $a; done

To keep only one copy of duplicated lines:

awk '!seen[$0]++'

To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:

grep "^>" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc

To remove everything after the first space at each line, e.g. to to simplify fasta headers:

cut -d' ' -f1 < your_file

To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):

parallel "echo {} && gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz

To count reads in a all .fastq.gz files in your current folder:

zcat *.gz | echo $((`wc -l`/4))

To count reads in a all .fastq files in your current folder:

cat *.fastq | echo $((`wc -l`/4))

To count base pairs in a all .fastq.gz files in your current folder:

zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c

To split multifasta file into many fasta files:

awk '/^>/ {OUT=substr($0,2) ".fa"}; {print >> OUT; close(OUT)}' Input_File

To convert Illumina FASTQ 1.3 to 1.8:

sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJ/' f.fastq

To convert FASTQ to FASTA:

sed -n '1~4s/^@/>/p;2~4p'

To get fastq read length distribution:

cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c

To deinterleave interleaved fastq file:

cat myf.fq | paste - - - - - - - - | tee >(cut -f 1-4 | tr "\t" "\n" > myfile_1.fq) | cut -f 5-8 | \
tr "\t" "\n" > myf2.fq

To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght >= 4000 + coverage >=100):

grep "^>" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 >= 4000 && $6 >= 100) print $0 }' | sort -k 4 -n | \
sed s"/ /_/"g

To append something to all headers of your fasta files:

sed 's/>.*/&YOURSTRING/' filename.fasta > new_filename.fasta

To replace/squeeze multiple adjacent spaces by only one space:

tr -s " " < file

To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.

cat your.fastq | paste - - - - | awk 'length($2)  >= 21 && length($2) <= 25' | sed 's/\t/\n/g' > filtered.fastq

To print difference between the last and first row in 5th column:

awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt

To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):

awk '/^>/{ seqlen=0; print; next; } seqlen < 200 { if (seqlen + length($0) > 200) $0 = substr($0, 1, 200-seqlen);\
 seqlen += length($0); print }' scaffolds.fasta > 200bp_scaffolds.fasta

To pipe a compressed fasta file directly into makeblastdb.

gunzip -c fasta.gz | makeblastdb -in -

To remove sequences with duplicate fasta headers from a fasta file.

awk '/^>/{f=!d[$1];d[$1]=1}f' in.fasta > out.fasta

2013 NextGen Genomics & Bioinformatics Technologies (NGBT) Conference, New Delhi, INDIA

Thu, 08 Aug 2013 16:21:16 -0500

2013 NextGen Genomics & Bioinformatics Technologies (NGBT) Conference

SciGenom Research Foundation (SGRF) and Institute of Genomics and Integrative Biology (IGIB) are pleased to host the Next-Generation Sequencing and Bioinformatics for Genomics & Healthcare conference.

In the ten years since the first human reference genome was completed for US$3 billion the sequencing technologies have radically changed leading to great reduction in sequencing cost. Today a human genome can be sequenced for under US$ 5000 in less than two weeks. It is expected that by the end of 2015 the cost of sequencing a human genome will drop to below thousand dollars. The next generation sequencing technologies over the past five years have enabled a large number of genomic studies that impact human health and disease. Also, this has made possible the growth of microbial, animal and plant genomics studies. While the data production has increased at a rapid pace challenges remain in analyzing and understanding the data. The conference will cover the next generation sequencing (NGS) technologies, bioinformatics for NGS and applications of NGS in many areas including personalized medicine.

For more info : http://www.scigenomconferences.com/2013/default.php

Bioinformatics Project Assistant Level II position at CSIR - Institute of Himalayan Bioresource Technology, Palampur (H.P.)

Thu, 17 May 2018 01:53:17 -0500

Walk-in-Interview is scheduled to be held on the date as mentioned below for selection of Suitable candidates in the following areas under the DBT sponsored project on purely temporary basis for the duration of the project(s) or till completion of projects whichever is earlier:

Project Title:
"Exploration of RBP-RNA interactions to reveal the post-transcriptional regulatory impact, and development of related tools and resource server".

Position: Project Assistant Level II (1 position)
Age : 28 years as on 14.06.2018
Salary : Rs.25,000/- P.M.

as per the funds provisions in the respective projects.

Eligibility Criteria :
Essential Qualifications: M.Sc. in Bioinformatics / Computational Biology or any area of Bioinformatics with 55% marks.

Essential Qualifications: M.Sc. in any area of Life Sciences with 55% marks with Diploma in any area of Bloinformatics.

Essential Qualifications: B.Tech. / M.Tech. in Bioinformatics / Computer Science with 55% marks.

Selection Procedure : Walk In Interview

Date : 14 June, 2018
Time : 9:30 A.M.
Venue : CSIR-IHBT Palampur (H.P.)

For more info refer to following doc:

http://ihbt.res.in/components/com_chronoforms5/chronoforms/uploads/Recruitment/20180516114701_Advt15_2018.pdf

Postdoctoral Associate - Bioinformatics at Duke University Medical Center

Sat, 10 Aug 2013 18:38:38 -0500

The Department of Biostatistics and Bioinformatics at Duke University Medical Center is seeking a Postdoctoral Associate for a one year appointment to work on several high-dimensional research projects. The specific goals of the project are to identify genes or molecular markers that are predictive of clinical outcomes in renal and prostate cancer.

Candidates must have: a PhD degree in statistics, biostatistics or bioinformatics, extensive experience in analyzing high-dimensional data (microarray, SNP, CNVs) and of validation approaches. In addition, experience in penalized regression methods, data base manipulation; and strong programming skills in order to conduct Monte Carlo studies and applications (R). Candidate must have excellent communication skills (verbal, written and presentation), a strong proficiency in Linux system.

This position is available immediately and will be filled as soon as possible. Appointment could be extended beyond the first year based on additional funding.

For more information about the Department of Biostatistics and Bioinformatics, please visit our website: http://www.biostat.duke.edu.

For more info: http://biostat.duke.edu/sites/biostat.duke.edu/files/Halabi%20-%20Postdoc%20Job%20Posting%202013%20updated.pdf

Duke University is an Equal Opportunity/Affirmative Action Employer.

Applied Statistics for Bioinformatics using R

Neel — Thu, 30 Aug 2018 03:45:39 -0500

The purpose of this book is to give an introduction into statistics in order to solve some problems of bioinformatics. Statistics provides procedures to explore and visualize data as well as to test biological hypotheses. The book intends to be introductory in explaining and programming elementary statistical concepts, thereby bridging the gap between high school levels and the specialized statistical literature

What are the difference between BioRuby and BioGem?

Neel — Mon, 12 Aug 2013 09:27:57 -0500

I came across two diferent but matching term BioRuby and BioGem. What are the difference between these two term? If both are using same Ruby language for development then why did they develope two different biological packages.

BETSY: A new backward-chaining expert system for automated development of pipelines in Bioinformatics

Jit — Mon, 17 Dec 2018 18:46:51 -0600

The BETSY provides a command-line interface and available at https://github.com/jefftc/changlab. A user first searches in the knowledge base for desired output and then BETSY develops an initial workflow to produce that data which is later examined by the user. The user can optimize the parameters, the algorithm to preprocess the data, and normalize it depending on the task.

Currently, BETSY consists of modules required for the microarray and next-generation sequencing data [4] such as expression analysis, classification, peak calling, and visualization.

Address of the bookmark: https://github.com/jefftc/changlab

Ph.D. Fellowship (Computational Biology/Bioinformatics) : Cork, Ireland : Cork, Ireland

Thu, 15 Aug 2013 14:09:00 -0500

Ph.D. Fellowship (18,000 euro/pa, plus tuition fees at the EU students rate) is available for four years to work on development of Bioinformatics resources for the analysis and visualization of ribosome profiling data. Ribosome profiling (ribo-seq) is a technology that allows mapping positions of the ribosomes on the whole transcriptome level with a nucleotide precision. The technology allows obtaining high resolution digital snapshots of gene expression in cells. The position is available starting on the 1st of October, 2013.

Candidate:
The candidate is expected to have B.S. or M.S. degree in the disciplines such as Computer Science, Statistics, Applied Mathematics, Physics or Electrical Engineering. The candidates with the backgrounds in Life Science disciplines such as Bioinformatics, Computational or Quantitative Biology will also be considered.

Location:
The position is available at LAPTI (http://lapti.ucc.ie) that is located in the Western Gate Building (http://www.stwarchitects.com/project-information.php?c=1&p=09993) at University College Cork. Western Gate Building Research Complex hosts several UCC departments and provides ideal environment for interdisciplinary research. Cork (sometimes referenced as “Venice of Ireland”) is the second most populous city in the Republic. It has friendly cosmopolitan atmosphere and vibrant culture. A number of American industrial giants such as Apple , EMC and Pfizer have chosen Cork as a home for their European headquarters.

Application process:
The details of the application process are given at http://lapti.ucc.ie/jobs.html. To ensure prompt processing of your application use the subject line: ‘Ph.D. computational’. All applications received prior to August the 1st are guaranteed equal consideration. However, applications at the later dates will also be considered until the position is filled.

Bioinformatics Training Courses At RASA LSI

RASA Life Sciences — Wed, 06 Nov 2019 00:30:51 -0600

RASA conducts comprehensive Life Science skill development training courses in Pune, India for working professionals, researchers, students and job-seeker. The trainings are crafted meticulously, covering different modules of courses such as Bioinformatics course, In silico Drug Discovery course, Next Generation Sequence data analysis course, Molecular Biology & Life science software development course wherein you learn from industry leaders how to apply these skills in life science & have a command over software developing process by using various methodologies. We conduct in-class training and instructor-led live online classes worldwide, along with corporate and skill development training worldwide.

Workshops are conducted in regular intervals on Drug Designing, Protein Modeling and Simulation, Chemoinformatics, Bioinformatics etc.The workshops are highly beneficial for working professionals, students, researcher for enhancements of the skills in short duration.