BOL: Related items

National Training on Bioinformatics Computational Tools for Microbial Research Nov 19 to 30, 2013

Fri, 27 Sep 2013 10:49:34 -0500

Agricultural research in modern scientific arena is being represented by proper integration among various research fields of biological, chemical and physical sciences, because this field encompasses many more complexities of biology in nature. In the era of fast accumulating biological data coming out from the research on many crop plants, live stocks and microbes and the impact of changing climate, habitat and other interrelations on these biological entities, bioinformatics has come forward across the globe to solve the problems of analysis, prediction, storage, management, pattern recognition, submission, retrieval and storage of the data to find out a fruitful outcome. This area is becoming increasingly important in the context of systems biology approach where a holistic approach is required to understand the biology and chemistry of the biological entities and their behavior during environmental interactions to resolve the harmful impact of biotic or abiotic causes on crop plants, animals, fishes, livestock sector, beneficial insects as well as microbes. The National Training program on ‘Computational Tools for Microbial Research” is an initiative for the capacity building of NARS scientists/researchers in this most emerging area and fast developing area of i.e. agricultural bioinformatics.

Contact The Director, National Bureau of Agriculturally Important Microorganisms, Kusmaur, Maunath Bhanjan-275101 (U.P.); Phone: 0547-2530080, Fax: 0547-2530358, e mail: nbaimicar@gmail.com; website: www.nbaim.org.in OR

Dr. Dhananjaya P. Singh, Senior Scientist & CCPI, NABG project, NBAIM, Maunath Bhanjan, 275101; Mob.- 09415291703; e mail - dpsfarm@rediffmail.com, nabg.nbaim@gmail.com

More at http://www.nbaim.org.in/Announc.aspx?cd=36

Tigers genome sequenced

Rahul Agarwal — Tue, 17 Sep 2013 16:48:24 -0500

Fifteen scientists led by Dr Jong Bhak of Genome Research Foundation, South Korea, decoded as many as 3 billion nucleotides (organic molecules that form the basic building blocks of nucleic acids, such as DNA). They identified 20,000 genes related to various functions of the tiger.

The biggest and perhaps most fearsome of the world's big cats, the tiger, shares 95.6 percent of its DNA with humans' cute and furry companions, domestic cats.

The new research showed that big cats have genetic mutations that enabled them to be carnivores. The team also identified mutations that allow snow leopards to thrive at high altitudes.

Reference:

http://www.nbcnews.com/science/your-cat-ferocious-tigers-share-lot-95-6-percent-their-4B11182690

http://timesofindia.indiatimes.com/home/environment/flora-fauna/Gene-mapping-of-tiger-completed/articleshow/22671681.cms

Paper:

http://www.nature.com/ncomms/2013/130917/ncomms3433/full/ncomms3433.html

EMBL Postdoc position in Bacterial Gene Gain Loss

Thu, 20 Aug 2015 14:09:21 -0500

A post-doctoral fellowship is available in the research groups of Nick Goldman (EBI) and John Welch (Genetics Department, Cambridge University) under the EMBL-EBI / Cambridge Computational Biomedical Postdoctoral Fellowship scheme.

The project is on bacterial gene gain and loss and emerging pathogenicity, and is described in full here: https://www.ebi.ac.uk/research/postdocs/ebpods/projects/goldman-welch-2015 . The EMBL-EBI / Cambridge Computational Biomedical Postdoctoral (“EBPOD”)

The closing date for applications is 3 September 2015. Nick Goldman EMBL-European Bioinformatics Institute Nick Goldman

More at https://www.ebi.ac.uk/research/postdocs/ebpods/projects/goldman-welch-2015

Opera: An optimal genome scaffolding program

Jit — Mon, 27 Nov 2017 10:18:20 -0600

Opera (Optimal Paired-End Read Assembler) is a sequence assembly program (http://en.wikipedia.org/wiki/Sequence_assembly ). It uses information from paired-end or long reads to optimally order and orient contigs assembled from shotgun-sequencing reads.

An updated version called OPERA-LG has been re-engineered with features for the assembly of large and complex genomes.

Song Gao, Denis Bertrand, Burton K. H. Chia and Niranjan Nagarajan. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees. Genome Biology, May 2016, doi: 10.1186/s13059-016-0951-y.

Song Gao, Wing-Kin Sung, Niranjan Nagarajan. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology, Sept. 2011, doi:10.1089/cmb.2011.0170.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0951-y

Address of the bookmark: https://sourceforge.net/projects/operasf/

Mike Ritchie Lab

Wed, 02 Oct 2013 15:25:45 -0500

Mike Ritchie Lab primary research focus is the detection of susceptibility genes for common diseases such as cancer, diabetes, hypertension, and cardiovascular disease, among others. The approaches will involve the development and application of new statistical methods with a focus on the detection of gene-gene interactions associated with human disease.

Gene expression and protein expression patterns between normal and non-normal tissues is a growing area of research that may lead to the identification of candidate genes for understanding the etiology of common, complex diseases.

Lab homepage @ http://ritchielab.psu.edu/ritchielab/

SPAdes hybrid genome assembly

Jit — Mon, 27 Nov 2017 08:05:40 -0600

When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.

Again, running spades.py will show you the options:

spades.py

This produces:

SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o 

Basic options:
-o          directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12          file with interlaced forward and reverse paired-end reads
-1            file with forward paired-end reads
-2            file with reverse paired-end reads
-s            file with unpaired reads
--pe<#>-12            file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1             file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2             file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s             file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-    orientation of reads for paired-end library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--s<#>                file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12            file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1             file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2             file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s             file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-    orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--hqmp<#>-12          file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1           file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2           file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s           file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-  orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--nxmate<#>-1         file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2         file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger              file with Sanger reads
--pacbio              file with PacBio reads
--nanopore            file with Nanopore reads
--tslr        file with TSLR-contigs
--trusted-contigs             file with trusted contigs
--untrusted-contigs           file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from      restart run with updated options and from the specified check-point ('ec', 'as', 'k', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset             file with dataset description in YAML format
-t/--threads               number of threads
                                [default: 16]
-m/--memory                RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir              directory for temporary files
                                [default: /tmp]
-k                 comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff             coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:

spades.py -t 4 \
          -m 32 \
          -k 31,51,71 \
          --only-assembler \
          -1 miseq.1.fastq -2 miseq.2.fastq \
          --nanopore minion.fastq \
          -o hybrid_assembly

In turn, these parameters mean

use 4 threads
max memory is 32Gb
use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71
only run the assembler, not the correction algorithm (for speed)
read 1 and read 2 of the MiSeq data
the nanopore data
put the output in folder “hybrid_assembly”

Research Associate @ ICGEB, New Delhi.

Wed, 09 Oct 2013 13:49:20 -0500

Applications are invited for Research Associate position in the DBT Sponsored Bioinformatics Infrastructure Facility at ICGEB, New Delhi.

Essential requirements: Experience of using bioinformatics tools.

Experience of working in Linux. Basic knowledge of computer network administration.

Desirable: Knowledge of Linux installation/administration and proficiency in either of the following:

Shell/PERL/Java/Python/VB/Oracle/MySQL/C/CUDA.

Qualification: PhD. or First class M.Sc degree in Bioinformatics or Biotechnology/life science with specialization in Bioinformatics.

Fellowships: Rs 22,000/- with HRA for PhD qualified, Rs 16000/- with HRA for NET/BET/BINC/GATE qualified and 12000/- with HRA for non NET qualified applicants.

Interested candidates may send their complete biodata along with a write-up of their experience and suitability for the position to Dr. Dinesh Gupta by email only to dinesh@icgeb.res.in within 15 days of publication of this advertisement. Kindly mark the email with subject “Application for BIF-RA-2013”

Closing date for applications: 18 October 2013

Only short listed candidates will be invited for an interview at ICGEB.

No TA/DA will be paid for attending the interview.

Advertisement: http://www.icgeb.org/tl_files/Vacancies/BIF-RA-Advt.pdf

jobTree based python wrapper to run the genome simulation tool suite Evolver

Jit — Fri, 08 Dec 2017 16:26:32 -0600

evolverSimControl (eSC) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (Newick format). In addition to simply running evolver, eSC also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (.maf) from leaf and branch nodes from a completed simulation and with the help of mafJoin, join them together into a single maf covering the entire simulation.

Address of the bookmark: https://github.com/dentearl/evolverSimControl

SRF/JRF/RA @ UNIVERSITY OF HYDERABAD

Mon, 14 Oct 2013 07:49:11 -0500

SCHOOL OF CHEMISTRY, UNIVERSITY OF HYDERABAD

Applications on plain paper along with details of CV (relevant photocopies of their
qualifications/experience and reprints of published work to be attached) are invited from qualified candidates for Research Fellowship in CSIR- sponsored research project.

JRF/SRF/RA (one vacancy)

CSIR sponsored “In silico design, identification and in vitro validation of lead molecule inhibitors to Bcr-Abl kinase”

JRF: M.Sc in Chemistry/ Bioinformatics/ Biotechnology with I division and NET or GATE qualified

SRF: M.Sc in chemistry/ Bioinformatics/ Biotechnology with at least two years of post- M.Sc research experience as evidenced from published papers in standard refereed journals in relevant area

RA: PhD in chemistry/ Bioinformatics/ Biotechnology with research experience in
relevant area.

As per CSIR guidelines

Notes:
1) You may visit the University of Hyderabad website www.uohyd.ernet.in to learn more about the University of Hyderabad.
2) Applicants should note that the appointment to be made is purely temporary and there is no right for claiming for any regular appointment in the University.
3) No TA/DA will be paid for attending the interview or at the time of joining the post, if selected.
4) The application should be submitted by post/courier/in-person to the address given below on or before November 1st 2013.

Prof. Lalitha Guruprasad
W-103, Gurbakhsh Singh Building
School of Chemistry
University of Hyderabad
Hyderabad- 500 046
5) Short-listed candidates will be called for interview at a short notice.

Advertisement: http://www.uohyd.ac.in/images/recruitment/chemisry_advt_101013.pdf

String graph based genome assembly software and tools !

Rahul Nayak — Tue, 19 Dec 2017 17:17:38 -0600

In graph theory, a string graph is an intersection graph of curves in the plane; each curve is called a "string". String graphs were first proposed by E. W. Myers in a 2005 publication. In recent Genome Research paper describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the paper is coauthored by Jared Simpson, the developer of ABySS assembler and Richard Durbin. iii) Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called ‘string graph’.

Following are the genome assembly tools based on string graph:

1.SGA (String Graph Assembler) https://github.com/jts/sga

Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.

2. SAGE: String-overlap Assembly of GEnomes https://github.com/lucian-ilie/SAGE2

SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.

3. FSG: Fast String Graph

The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.

4. BASE https://github.com/dhlbh/BASE

It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs. BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.

5. Fermi https://github.com/lh3/fermi/

Fermi is a de novo assembler with a particular focus on assembling Illumina short sequence reads from a mammal-sized genome. In addition to the role of a typical assembler, fermi also aims to preserve heterozygotes which are often collapsed by other assemblers. Its ultimate goal is to find a minimal set of unitigs to represent all the information in raw reads.

If you want to learn about String Graph assembler, please read the following papers -

i) The Fragment Assembly String Graph - E. W. Myers

This paper describes the String Graph concept.

ii) Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin

This earlier paper from Simpson and Durbin

iii) Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin