BOL: Related items

Personalised Medicine - Animation

Tue, 27 Aug 2013 10:07:24 -0500

Two animated case scenarios set now and in the future. These highlight potential differences in the way patients are treated now, and how they might be treated as healthcare becomes more tailored.

Bioinformatics OneLiner

Rahul Nayak — Tue, 10 Apr 2018 04:13:03 -0500

To remove all line ends (\n) from a Unix text file:

sed ':a;N;$!ba;s/\n//g' filename.txt > newfilename_oneline.txt

To get average for a column of numbers (here the second column $2):

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

To get sequence length for all sequences in a fasta file:

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \
filename.fasta

To copy (move, rename, etc) files based on their list in a text file:

cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done

To split bam files into sets with mapped and unmapped reads:

samtools view -F4 sample.bam > sample.mapped.sam
samtools view -f4 sample.bam > sample.unmapped.sam

To gzip all your fastq files using gnu parallel and gzip:

parallel gzip ::: *.fastq

To gzip all your fastq files using pigz:

pigz *.fastq

To count all sequences in a fasta file:

grep "^>" yourfile.fasta -c

To count all sequences in all fasta files in your current directory:

for a in *.fasta; do ls $a; grep "^>" -c $a; done

To keep only one copy of duplicated lines:

awk '!seen[$0]++'

To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:

grep "^>" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc

To remove everything after the first space at each line, e.g. to to simplify fasta headers:

cut -d' ' -f1 < your_file

To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):

parallel "echo {} && gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz

To count reads in a all .fastq.gz files in your current folder:

zcat *.gz | echo $((`wc -l`/4))

To count reads in a all .fastq files in your current folder:

cat *.fastq | echo $((`wc -l`/4))

To count base pairs in a all .fastq.gz files in your current folder:

zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c

To split multifasta file into many fasta files:

awk '/^>/ {OUT=substr($0,2) ".fa"}; {print >> OUT; close(OUT)}' Input_File

To convert Illumina FASTQ 1.3 to 1.8:

sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJ/' f.fastq

To convert FASTQ to FASTA:

sed -n '1~4s/^@/>/p;2~4p'

To get fastq read length distribution:

cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c

To deinterleave interleaved fastq file:

cat myf.fq | paste - - - - - - - - | tee >(cut -f 1-4 | tr "\t" "\n" > myfile_1.fq) | cut -f 5-8 | \
tr "\t" "\n" > myf2.fq

To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght >= 4000 + coverage >=100):

grep "^>" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 >= 4000 && $6 >= 100) print $0 }' | sort -k 4 -n | \
sed s"/ /_/"g

To append something to all headers of your fasta files:

sed 's/>.*/&YOURSTRING/' filename.fasta > new_filename.fasta

To replace/squeeze multiple adjacent spaces by only one space:

tr -s " " < file

To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.

cat your.fastq | paste - - - - | awk 'length($2)  >= 21 && length($2) <= 25' | sed 's/\t/\n/g' > filtered.fastq

To print difference between the last and first row in 5th column:

awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt

To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):

awk '/^>/{ seqlen=0; print; next; } seqlen < 200 { if (seqlen + length($0) > 200) $0 = substr($0, 1, 200-seqlen);\
 seqlen += length($0); print }' scaffolds.fasta > 200bp_scaffolds.fasta

To pipe a compressed fasta file directly into makeblastdb.

gunzip -c fasta.gz | makeblastdb -in -

To remove sequences with duplicate fasta headers from a fasta file.

awk '/^>/{f=!d[$1];d[$1]=1}f' in.fasta > out.fasta

33rd Annual Convention of Indian Association for Cancer Research from 13th to 15th February 2014

Tue, 27 Aug 2013 10:37:08 -0500

RGCB is organizing the 33rd Annual Convention of Indian Association for Cancer Research from 13th to 15th February 2014 with the theme "Discovery, Innovation and Translation in Cancer Research"

Kindly log on to conference website http://rgcb.res.in/IACR2014 for further details and timely updates and registration. We shall truly appreciate if the same be circulated among your friends, scholars and students encouraging them to participate in the meet.

http://210.212.237.38/iacrconference/

Bioinformatics Project Assistant Level II position at CSIR - Institute of Himalayan Bioresource Technology, Palampur (H.P.)

Thu, 17 May 2018 01:53:17 -0500

Walk-in-Interview is scheduled to be held on the date as mentioned below for selection of Suitable candidates in the following areas under the DBT sponsored project on purely temporary basis for the duration of the project(s) or till completion of projects whichever is earlier:

Project Title:
"Exploration of RBP-RNA interactions to reveal the post-transcriptional regulatory impact, and development of related tools and resource server".

Position: Project Assistant Level II (1 position)
Age : 28 years as on 14.06.2018
Salary : Rs.25,000/- P.M.

as per the funds provisions in the respective projects.

Eligibility Criteria :
Essential Qualifications: M.Sc. in Bioinformatics / Computational Biology or any area of Bioinformatics with 55% marks.

Essential Qualifications: M.Sc. in any area of Life Sciences with 55% marks with Diploma in any area of Bloinformatics.

Essential Qualifications: B.Tech. / M.Tech. in Bioinformatics / Computer Science with 55% marks.

Selection Procedure : Walk In Interview

Date : 14 June, 2018
Time : 9:30 A.M.
Venue : CSIR-IHBT Palampur (H.P.)

For more info refer to following doc:

http://ihbt.res.in/components/com_chronoforms5/chronoforms/uploads/Recruitment/20180516114701_Advt15_2018.pdf

Applied Statistics for Bioinformatics using R

Neel — Thu, 30 Aug 2018 03:45:39 -0500

The purpose of this book is to give an introduction into statistics in order to solve some problems of bioinformatics. Statistics provides procedures to explore and visualize data as well as to test biological hypotheses. The book intends to be introductory in explaining and programming elementary statistical concepts, thereby bridging the gap between high school levels and the specialized statistical literature

Bioinformatician needs ten heads !!!

Jitendra Narayan — Sat, 17 Aug 2013 10:30:45 -0500

Bioinformatics demands more and ... lots more knowledge. In this case Ravan, a mythological character from the Ramayan, can only be a real bioinformatician. :) :P

BETSY: A new backward-chaining expert system for automated development of pipelines in Bioinformatics

Jit — Mon, 17 Dec 2018 18:46:51 -0600

The BETSY provides a command-line interface and available at https://github.com/jefftc/changlab. A user first searches in the knowledge base for desired output and then BETSY develops an initial workflow to produce that data which is later examined by the user. The user can optimize the parameters, the algorithm to preprocess the data, and normalize it depending on the task.

Currently, BETSY consists of modules required for the microarray and next-generation sequencing data [4] such as expression analysis, classification, peak calling, and visualization.

Address of the bookmark: https://github.com/jefftc/changlab

Best practices in bioinformatics training for life scientists

Jitendra Narayan — Tue, 13 Aug 2013 15:47:34 -0500

Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts.

Find the detail paper at http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.full

Address of the bookmark: http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.full

Bioinformatics Training Courses At RASA LSI

RASA Life Sciences — Wed, 06 Nov 2019 00:30:51 -0600

RASA conducts comprehensive Life Science skill development training courses in Pune, India for working professionals, researchers, students and job-seeker. The trainings are crafted meticulously, covering different modules of courses such as Bioinformatics course, In silico Drug Discovery course, Next Generation Sequence data analysis course, Molecular Biology & Life science software development course wherein you learn from industry leaders how to apply these skills in life science & have a command over software developing process by using various methodologies. We conduct in-class training and instructor-led live online classes worldwide, along with corporate and skill development training worldwide.

Workshops are conducted in regular intervals on Drug Designing, Protein Modeling and Simulation, Chemoinformatics, Bioinformatics etc.The workshops are highly beneficial for working professionals, students, researcher for enhancements of the skills in short duration.

Bioinformatics -- Understanding of living systems through information science

Wed, 14 Aug 2013 11:50:17 -0500

Recently, the progress of the Human Genome Project, aiming to decode all human DNA sequences, has highlighted a research field called bioinformatics. In this new field, computers and techniques from information science are not just used as tools to advance life science research; they're expected to have a major impact on how we think about the life sciences. Q. The main feature of bioinformatics is, it utilizes computers to analyze life. One is example is the genome. In all organisms, DNA contains genetic information, and this is called the genome. But the amount of information involved is huge, so recently, it's been read using next-generation sequencers, and analyzed by computers. In bioinformatics research, what we do is utilize those genome information to investigate the principles of life. As an organism evolves, its genome sequence changes through sudden mutations. Additionally, at the genome level, mutations called rearrangements, such as inversions, transpositions, and duplications, occur. The genome comparison system developed by the Sakakibara Lab calculates homologous sequences called anchors, which are conserved between species. If the genome is considered as a long text, then anchors can be thought of as words. Q. We're coming to understand the genomes of various organisms - not just humans, but monkeys, chimpanzees, bacteria, and so on. The first method used to analyze a genome is comparing it with the genomes of other organisms, to see where it's the same and where it's different. In that way, the content of the genome is decoded bit by bit, using computers. By contrast, in our method, we've developed software called Murasaki, which we also use to analyze large genomes, by comparing them with those of other organisms. The Sakakibara Lab uses a next-generation sequencer at Keio University, along with a cluster machine with hundreds of CPUs. In this way, the Lab is analyzing genome mutations that cause cancer, and the genome of the natto production strain Bacillus subtilis. Until now, genome analysis could only be done in national-scale projects. But now, next-generation sequencer development has made genome analysis possible in an ordinary lab. In a world-first achievement, the Sakakibara Lab has decoded the natto bacillus genome, through analysis using Keio's next-generation sequencer. Q. In the future, biology and the life sciences may become almost entirely information science and computer science. And in healthcare, that may enable us, for example, to predict whether individuals are susceptible to cancer, or to certain lifestyle-related diseases, by understanding their personal genome data. So, I think it's amply possible that we can make use of such information effectively, to help people live longer and be free from disease, by thinking about their lifestyle habits. Bioinformatics is only two decades old. In this field, many areas are still unknown. Professor Sakakibara, having been involved since the beginning, will continue tackling new, challenging research projects.