BOL: Related items

RA Bioinformatics at JNU, New Delhi, INDIA

Thu, 27 Apr 2017 03:29:58 -0500

School of Computational & Integrative Sciences
Jawaharlal Nehru University
New Delhi-110067, INDIA

Date: April 24th. 2017 Last Date: May 6th 2017
PROJECT ID: 632

The following posts are urgently required to be filled for the Department of Biotechnology, Government of India funded project jointly running with IIIT-Hyderabad & JNU, entitled "Computational Core for Plant Metabolomics" administrated by Prof Indira Ghosh, School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110 067.
NB: For all the posts, preference will be given to candidates with a good knowledge of Python and/or R in UNIX platform , knowledge of JAVA will also get a special consideration.

1. RA / Research Associate (Metabolic engineering/Computational Biologist)

Salary: Rs. 36000/- + HRA

Vacancy: 1

Essential Qualifications: PhD in Bioinformatics /Mathematics/Computer Science with experience in analyzing high throughput omics-based data/Analysis of Network Biology/Chemoinformatics/Computational Biology related Software development. Published paper in the field is a must to prove the experience. Special consideration will be given if have experience in Industry, teaching & product development.

Desired Skills: Prior experience in handling and guiding bioinformatics, metabolomics data, planning of new research area in metabolic driven network , collaborating with industry , preparing and filing reports etc. Will be expected to communicate with user groups and coordinate with LIMS group in Hyderabad and the Cheminformatics group in Delhi.

2. Project SRF (Network model building/Systems biology integration)

Salary*: Rs.18000/- + HRA

Vacancy: 1

Essential Qualifications: M.Tech in Computational Biology with project experience or Masters / B.Tech in Basic Sciences with at least 2yrs of research experience in Bioinformatics/Mathematical Model building using Computational Biology tools & related Database / Network analysis etc. For M.Sc/B.Tech, Published paper in peer-reviewed Journal whereas for M.Tech, the degree obtained in computational biology is a must.

Desired Skills: Will be expected to manage ongoing research activities in LIMS, interact with LIMS group, build network model using data compiled by experimentalist, prepare and file reports and associated project work etc. Familiarity with plant systems biology and genomics /metabolite resources related to plant metabolomics is desirable.

More at http://www.jnu.ac.in/Career/currentjobs.htm

Bienko and Crosetto Labs

Fri, 12 May 2017 07:42:15 -0500

We are two groups of scientists doing frontier research in quantitative biology and biomedicine. The Bienko group is interested in exploring the fundamental design principles controlling how DNA is packed in the eukaryotic nucleus and its relation to gene expression regulation. The Crosetto group engineers new molecular methods for single-cell and spatially resolved omic measurements of DNA, RNA, and proteins, with a strong focus on tumor heterogeneity. By sharing ideas and resources, we work synergistically towards a more quantitative understanding of life’s processes in healthy and diseased conditions.

https://bienkocrosettolabs.org/

Tryst with a Bioinformatician # Dr Altan Kara

Jitendra Narayan — Thu, 16 Nov 2017 08:47:52 -0600

Dr Altan Kara is a Bioinformatics specialist at the faculty of Gene Engineering and Biotechnology Institute at TUBITAK MAM Research Center. His research interest revolves around the cancer informatics and computational aided-drug design. I applaud Dr Altan for clearly setting out both his expectations of people that join his lab/university in addition to listing his responsibilities to his research members at TUBITAK MAM Research Institüte. Hopefully, this interview will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

You can find out more about Dr Altan by visiting his (well documented) lab page (http://gmbe.mam.tubitak.gov.tr/en) and BOL page http://bioinformaticsonline.com/profile/altan . And now, on to the BOL:“Tryst with a Bioinformatician” interview series ...

What push you to join Computational Biology/Bioinformatics?

According to me, bioinformatics is the center of modern biological research and if a researcher wants to discover new biological insights by evaluating the globally produced biological data to derivate unified solutions for specific biological problems, learning bioinformatics is the only way to achieve this goal.

What fascinates you about Computational Biology/Bioinformatics?

It's flexibility. As well known, there are highly diverse and complex biological questions are waiting to be enlightened and it's impossible to bring solutions to this diversity by using similar approaches. Thus, the employed method has to be unique for the targeted biological problem and by using bioinformatics tools this can be easily achieved.

What is the one word you would use to describe yourself?

Bioinformatician. :)

Can you please describe your research work in a nutshell for BOL users.

At my current Institute, I am working in the field of cancer bioinformatics. Briefly, the overall aim of the project which I am working for (AKMARK (Project CODE:5153403)) is, applying a bioinformatics-supported genome, transcriptome, proteome, and metabolome analysis to reveal the molecular profile of the disease through an integrated approach, and to develop an early diagnosis and scanning kit based on this profile. Alterations in the gene, transcript, protein, and metabolite profiles between normal tissue, normal tissue adjoined to the tumor (reactive stroma), tumor tissue, lymph node metastasis, and blood samples taken from the same patient and the reflection of these changes in some other selected body fluids will be revealed within the scope of the project. The molecular structures involved in the development and progression of NSCLC will be determined and relations with the clinical, tumor-node-metastasis (TNM) staging and histology will be made. The development of a diagnostic kit for immediate clinical purposes and an electrochemical biosensor for quick on-site applications are targeted through the development of a number of antibody and aptamer formed against the most specific biomarker selected from the panel.

Is there anything else we should know about you and your research?

Besides AKMARK, I am also in preparation of having a side project that aims for the development of a computational method to design inhibitors for prokaryotic two-component systems. In this project, I will be in collaboration with Prof. Maria Kontoyianni, SIUE: Southern Illinois University Edwardsville, School of Pharmacy.

What was your greatest scientific disappointment in life till now?

So far I do not experience any memorable scientific disappointment in my life. :)

What major research challenges and problems did you face yet? How did you handle them?

The major challenge which I faced so far in my scientific career was predicting the interaction between the prokaryotic two-component proteins. To be able to accurately predict the interactions between these proteins, I create a meta-predictor by using a support vector machine. By using this technique I integrated six different protein-protein interaction methods in a way to cover disadvantage of one method with the advantage of another one. The meta-predictor which I developed during this work is accessible via http://metapred2cs.ibers.aber.ac.uk/ and for more detailed information about the system the articles with the PMID IDs; PMID: 27378293 and PMID: 26384938 can be read.

What's your all-time favourite bioinformatics package, and why?

For me, the best bioinformatics package is R/Bioconductor. The reason why I like this package is, it provides lots of useful tools for comprehensive analysis and comparison of high-throughput experimental data in an integrated manner and besides lots of the packages it provides, it is open source and also open for development. As a result, it provides strong and flexible ways to do science.

In bioinformatics, do you see yourself in which of the following roles-scientist, analyst, developer, engineer or pure academician?

Scientist / Developer.

What will you like to accomplish in next five years / ten years?

For my current research, I would like to design a pipeline to automatically integrate and analyse omics data for cancer research which will be specifically aiming for biomarker and novel drug target discovery. In addition to this, I also like to develop another pipeline for prokaryotic TCS protein structure prediction and inhibitor design.

When you will be retired, what would you tell next generation bioinformaticians?

Bioinformatics is not all about scripting and researchers who study in this field should never expect a tool to do their analyses for them. Besides computational skills, a bioinformatician must have a strong biological background in his/her research area which will allow them to understand if anything went wrong during their run by only looking at the results instead of just blindly trusting the output of the bioinformatics tools.

What you always miss in bioinformatics when you will no longer working in this field?

Bioinformatics is open to doing multi-discipliner research with scientists all around the world. As a result, while I studying in this field I can interactively learn a lot from wide range research community. I think this is the one thing which I will miss the most.

If there will be bioinformatics company owned by you in future, What are your company focus and aim?

With the increasing amount of data in databases, there is already a massive need for effective methods to eliminate the manipulated data and reach to clean/useful information. As days pass, the requirement of data mining will be the first step of any research project. For this reason, the major goal of my bioinformatics company will be developing effective tools to eliminate manipulated datasets and information that exist in the literature and provide trustworthy clean information/datasets for researchers.

How much bioinformatics change in 2050, according to your wild imagination?

Bioinformatics is a field that constantly and dynamically changes. As the bioinformatics progress, new tools and methods become available and they provide a better application of existing methods or totally new methods that offer an alternative solution to various biological problems. A long with these updates, developers also provide easy to use GUIs for most of the tools. Considering this, if the field carries on developing like this, every single researcher with a strong biological background can be able to perform bioinformatics analyses by him/herself without needing a professional help. As a result, almost all of the bioinformaticians will be responsible just for development of new methods/tools.

What would one piece of advice you give someone who's trying to reinvent themselves and enter into bioinformatics sector?

Bioinformatics is a wide field with a lot of career options. Thus, if a researcher likes to step into this field first he/she should be clear about the branch of the bioinformatics they like to study in. Following to this decision they should first learn at least one programing language and investigate the ways of how other researcher employed that language in their researches and WHY? A researcher, in this field, should never create and use copy paste scripts but always must understand WHY the other researcher worked in that way. Knowing the answer of this question is the only way to learn bioinformatics. Besides, a researcher in the field of bioinformatics (from any branch) must always be good about the environmental control. In other words, one should always easily control input output directories, modify files or directories, annotate and modify employed scripts during the research and should not allow any confusion during the different stages of the research. Finally, they should not blindly trust the output of a tool/software but do a benchmarking test for each of the tools which they decided to utilise in their research. In addition to this, even if the tools pass the benchmarking, researchers should have a good biological background in their field to tell if anything when wrong during the process by only looking the output(s) of the employed pipelines/packages/tools.

Postdoctoral scholarship in Bioinformatics at KTH

Thu, 21 Dec 2017 03:55:53 -0600

The School of Biotechnology offers a curriculum that reflects the multidisciplinary nature of Biotechnology, integrating theoretical and applied science in undergraduate and graduate courses. The school has six departments with about 300 employees, located at AlbaNova University Center in Stockholm and Science for Life Laboratory in Solna. The Biotechnology research within the school is internationally well recognized.

We are now seeking a postdoc scholarship holder with strong background in transcriptomics to use this large collection of data for integrative studies. Focus will be on advanced bioinformatics and statistical analysis of data from high-throughput sequencing including integration with the other platforms.

The scholarship holder must have a PhD with an outstanding research and publication record and will be selected based on her/his excellence and her/his skills. A PhD should have been awarded less than five years before the deadline of the application. The scholarship holder must have a strong background in bioinformatics, computer science, computational biology or equivalent with a profound knowledge about biology and biostatistics.

Your complete application must be received at KTH no later than 2018-01-15.

https://www.kth.se/en/om/work-at-kth/stipendier/postdoctoral-scholarship-in-bioinformatics-with-focus-on-transcriptomics-and-data-integration-1.779571

ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data

Jit — Mon, 19 Feb 2018 06:46:15 -0600

ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics.

The new features include

(i) building gene-based and supermatrix-based phylogenies using a single command,

(ii) testing and visualizing evolutionary models,

(iii) calculating distances between trees of different size or including duplications, and

(iv) providing seamless integration with the NCBI taxonomy database.

ETE is freely available at http://etetoolkit.org

Address of the bookmark: http://etetoolkit.org

Linux Commands Cheat Sheet for Bioinformatics and Computational Biology Professionals

Rahul Nayak — Mon, 05 Feb 2018 18:50:41 -0600

The purpose of this cheat sheet is to introduce biologist and bioinformatician to the frequently used tools for NGS analysis as well as giving experience in writing one-liners.

File System
ls — list items in current directory
ls -l — list items in current directory and show in long format to see perimissions, size, and modification date
ls -a — list all items in current directory, including hidden files
ls -F — list all items in current directory and show directories with a slash and executables with a star
ls dir — list all items in directory dir
cd dir — change directory to dir
cd .. — go up one directory
cd / — go to the root directory
cd ~ — go to to your home directory
cd - — go to the last directory you were just in
pwd — show present working directory
mkdir dir — make directory dir
rm file — remove file
rm -r dir — remove directory dir recursively
cp file1 file2 — copy file1 to file2
cp -r dir1 dir2 — copy directory dir1 to dir2 recursively
mv file1 file2 — move (rename) file1 to file2
ln -s file link — create symbolic link to file
touch file — create or update file
cat file — output the contents of file
less file — view file with page navigation
head file — output the first 10 lines of file
tail file — output the last 10 lines of file
tail -f file — output the contents of file as it grows, starting with the last 10 lines
vim file — edit file
alias name 'command' — create an alias for a command
System
shutdown — shut down machine
reboot — restart machine
date — show the current date and time
whoami — who you are logged in as
finger user — display information about user
man command — show the manual for command
df — show disk usage
du — show directory space usage
free — show memory and swap usage
whereis app — show possible locations of app
which app — show which app will be run by default
Process Management
ps — display your currently active processes
top — display all running processes
kill pid — kill process id pid
kill -9 pid — force kill process id pid
Permissions
ls -l — list items in current directory and show permissions
chmod ugo file — change permissions of file to ugo - u is the user's permissions, g is the group's permissions, and o is everyone else's permissions. The values of u, g, and o can be any number between 0 and 7.
7 — full permissions
6 — read and write only
5 — read and execute only
4 — read only
3 — write and execute only
2 — write only
1 — execute only
0 — no permissions
chmod 600 file — you can read and write - good for files
chmod 700 file — you can read, write, and execute - good for scripts
chmod 644 file — you can read and write, and everyone else can only read - good for web pages
chmod 755 file — you can read, write, and execute, and everyone else can read and execute - good for programs that you want to share
Networking
wget file — download a file
curl file — download a file
scp user@host:file dir — secure copy a file from remote server to the dir directory on your machine
scp file user@host:dir — secure copy a file from your machine to the dir directory on a remote server
scp -r user@host:dir dir — secure copy the directory dir from remote server to the directory dir on your machine
ssh user@host — connect to host as user
ssh -p port user@host — connect to host on port as user
ssh-copy-id user@host — add your key to host for user to enable a keyed or passwordless login
ping host — ping host and output results
whois domain — get information for domain
dig domain — get DNS information for domain
dig -x host — reverse lookup host
lsof -i tcp:1337 — list all processes running on port 1337
Searching
grep pattern files — search for pattern in files
grep -r pattern dir — search recursively for pattern in dir
grep -rn pattern dir — search recursively for pattern in dir and show the line number found
grep -r pattern dir --include='*.ext — search recursively for pattern in dir and only search in files with .ext extension
command | grep pattern — search for pattern in the output of command
find file — find all instances of file in real system
locate file — find all instances of file using indexed database built from the updatedb command. Much faster than find
sed -i 's/day/night/g' file — find all occurrences of day in a file and replace them with night - s means substitude and g means global - sed also supports regular expressions
Compression
tar cf file.tar files — create a tar named file.tar containing files
tar xf file.tar — extract the files from file.tar
tar czf file.tar.gz files — create a tar with Gzip compression
tar xzf file.tar.gz — extract a tar using Gzip
gzip file — compresses file and renames it to file.gz
gzip -d file.gz — decompresses file.gz back to file
Shortcuts
ctrl+a — move cursor to beginning of line
ctrl+f — move cursor to end of line
alt+f — move cursor forward 1 word
alt+b — move cursor backward 1 word

S-plot2: Rapid Visual and Statistical Analysis of Genomic Sequences

Abhimanyu Singh — Tue, 02 Oct 2018 17:57:27 -0500

S-plot2 creates an interactive, two-dimensional heatmap capturing the similarities and dissimilarities in nucleotide usage between genomic sequences (partial or complete). In S-plot2, whole eukaryotic chromosomes and smaller prokaryotic genomes can be efficiently compared. The tool includes functionality to extract, analyze, and automate BLAST queries of regions of interest within the heatmap. This facilitates the investigation of quickly evolving coding regions, novel coding regions, and laterally transferred elements.

Address of the bookmark: https://bitbucket.org/lkalesinskas/splot

Bioinformatics OneLiner

Rahul Nayak — Tue, 10 Apr 2018 04:13:03 -0500

To remove all line ends (\n) from a Unix text file:

sed ':a;N;$!ba;s/\n//g' filename.txt > newfilename_oneline.txt

To get average for a column of numbers (here the second column $2):

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }'

To get sequence length for all sequences in a fasta file:

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen = seqlen +length($0)}END{print seqlen}' \
filename.fasta

To copy (move, rename, etc) files based on their list in a text file:

cat file_list.txt | while read line; do cp "$line" complete_dataset/"$line"; done

To split bam files into sets with mapped and unmapped reads:

samtools view -F4 sample.bam > sample.mapped.sam
samtools view -f4 sample.bam > sample.unmapped.sam

To gzip all your fastq files using gnu parallel and gzip:

parallel gzip ::: *.fastq

To gzip all your fastq files using pigz:

pigz *.fastq

To count all sequences in a fasta file:

grep "^>" yourfile.fasta -c

To count all sequences in all fasta files in your current directory:

for a in *.fasta; do ls $a; grep "^>" -c $a; done

To keep only one copy of duplicated lines:

awk '!seen[$0]++'

To sum assembly size from SPAdes contigs.fasta or scaffolds.fasta file:

grep "^>" scaffolds.fasta | cut -f 4 -d '_' | paste -sd+ | bc

To remove everything after the first space at each line, e.g. to to simplify fasta headers:

cut -d' ' -f1 < your_file

To count reads in a all .fastq.gz files in your current folder (fast, using gnu parallel):

parallel "echo {} && gunzip -c {} | wc -l | awk '{d=\$1; print d/4;}'" ::: *.gz

To count reads in a all .fastq.gz files in your current folder:

zcat *.gz | echo $((`wc -l`/4))

To count reads in a all .fastq files in your current folder:

cat *.fastq | echo $((`wc -l`/4))

To count base pairs in a all .fastq.gz files in your current folder:

zcat *.fastq.gz | paste - - - - | cut -f 2 | tr -d '\n' | wc -c

To split multifasta file into many fasta files:

awk '/^>/ {OUT=substr($0,2) ".fa"}; {print >> OUT; close(OUT)}' Input_File

To convert Illumina FASTQ 1.3 to 1.8:

sed -e '4~4y/@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJ/' f.fastq

To convert FASTQ to FASTA:

sed -n '1~4s/^@/>/p;2~4p'

To get fastq read length distribution:

cat reads.fastq | awk '{if(NR%4==2) print length($1)}' | sort | uniq -c

To deinterleave interleaved fastq file:

cat myf.fq | paste - - - - - - - - | tee >(cut -f 1-4 | tr "\t" "\n" > myfile_1.fq) | cut -f 5-8 | \
tr "\t" "\n" > myf2.fq

To filter and sort contig identifiers from SPAdes assembly (e.g. here lenght >= 4000 + coverage >=100):

grep "^>" scaffolds.fasta | sed s"/_/ /"g | awk '{ if ($4 >= 4000 && $6 >= 100) print $0 }' | sort -k 4 -n | \
sed s"/ /_/"g

To append something to all headers of your fasta files:

sed 's/>.*/&YOURSTRING/' filename.fasta > new_filename.fasta

To replace/squeeze multiple adjacent spaces by only one space:

tr -s " " < file

To filter fastq based on length (here larger than or equal to 21, but smaller than or equal to 25.

cat your.fastq | paste - - - - | awk 'length($2)  >= 21 && length($2) <= 25' | sed 's/\t/\n/g' > filtered.fastq

To print difference between the last and first row in 5th column:

awk '{if (!first){first=$5;}; last=$5;} END {print last-first}' myfile.txt

To sample only 200 first bases from all sequences in a multifasta file (e.g. from assembly scaffolds.fasta file here):

awk '/^>/{ seqlen=0; print; next; } seqlen < 200 { if (seqlen + length($0) > 200) $0 = substr($0, 1, 200-seqlen);\
 seqlen += length($0); print }' scaffolds.fasta > 200bp_scaffolds.fasta

To pipe a compressed fasta file directly into makeblastdb.

gunzip -c fasta.gz | makeblastdb -in -

To remove sequences with duplicate fasta headers from a fasta file.

awk '/^>/{f=!d[$1];d[$1]=1}f' in.fasta > out.fasta

Binding Site Prediction in Protein !

Poonam Mahapatra — Wed, 25 Apr 2018 04:35:57 -0500

The interaction between proteins and other molecules is fundamental to all biological functions. In this section we include tools that can assist in prediction of interaction sites on protein surface and tools for predicting the structure of the intermolecular complex formed between two or more molecules (docking).

Pockets Identification

CASTp

Automatic Identification of pockets and cavities in proteins structure, and quantitation of their volumes using Delaunay triangulation. Available also as PyMOL plugin

Pocket-Finder

Automatic identification of pockets and cavities in proteins structure, and quantitation of their volumes.

PocketPicker

Grid-based technique for the analysis of protein pockets. PocketPicker available as a plugin for PyMOL

Binding Site Prediction

ConSurf

Identification of functional regions in proteins by surface-mapping of phylogenetic information

CRESCENDO

Identification protein interaction sites. It uses sequence conservation patterns in homologous proteins to distinguish between residues that are conserved due to structural restraints from those due to functional restraints.

Ligand Binding Sites

3DLigandSite

The server utilizes protein-structure prediction to provide structural models of the binding site. Ligands bound to structures are superimposed onto the model and use to predict the binding site.

FINDSITE

A threading-based method for ligand-binding site prediction and functional annotation based on binding-site similarity across superimposed groups of threading templates.

LIGSITE^csc

Prediction of binding site by pocket identification using the Connolly surface and degree of conservation

metaPocketA meta server for ligand-binding site prediction. metaPocket use LIGSITE^csc, PASS, Q-SiteFinder and SURFNET

Pango Lineage Analysis !

Abhi — Mon, 15 Nov 2021 03:38:29 -0600

The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. This website documents all current Pango lineages and their spread, as well as various software tools which can be used by researchers to perform analyses on SARS-COV-2 sequence data.

Address of the bookmark: https://cov-lineages.org/resources/pangolin/output.html