BOL: Related items

PhD at University of Calgary

Fri, 27 Dec 2013 20:24:39 -0600

Institution/Company:
University of Calgary
Location:
Calgary, AB
Job Description:

Novel diagnostic platform for detection of Osteoarthritis

I invite applications from highly motivated individuals to join my laboratory as a PhD student in Systems Biology at the University of Calgary McCaig Institute for Bone and Joint Health. This project is aimed at characterizing the networks of physical (protein-protein) interactions underlying inflammatory processes in patients with Osteoarthritis and how this differs from patients with Rheumatoid Arthritis and normal individuals. This work will eventually lead to the development of a novel diagnostic platform for the non-invasive and accurate detection of early Osteoarthritis. The selected candidate will use state-of-the-art computational methodologies to systematically analyze proteomic data, and develop /implement new algorithms to identify protein and functional interaction networks from high throughput experimental data. The individual will also benefit by working closely with experts at the UofC and UofA through an AIHS Alberta Osteoarthritis Team Grant which includes experts from all pillars of health research. The candidate will also be supported to attend bioinformatics workshops and conferences to advance and disseminate their research.
Qualifications: The ideal candidate will have a Master’s degree in Computational Biology, Bioinformatics, or equivalent with strong background knowledge of the Biological Sciences, Biochemistry, and Microbiology. The individual should additionally have experience in handling high-throughput data sets as well as programming skills. The candidate will be registered as a PhD student in Dr. Krawetz’s laboratory, located in the new state-of-the-art Health Research Innovation Centre at the UofC. The individual will have strong verbal and written skills and the ability to work efficiently in a team environment.

In addition to the outstanding research opportunities available in this setting, students also enjoy the many cultural and sporting amenities provided in the city of Calgary, and can take advantage of the unparalleled skiing and hiking in the Rocky Mountains that are less than an hour away.

Candidates must be academically competitive and will be expected to apply for external funding. The stipend is $25,000/yr. For outstanding PhD students, internal top-up award opportunities are available on a competitive basis. If interested in joining the lab, please contact Dr. Krawetz directly at rkrawetz@ucalgary.ca and provide the following information:

- Short cover letter explaining your interest in the lab
- Resume
- Scanned copy of transcript or listing of course grades
- Names and contact information for two individuals who will be willing to provide letters of reference

SPAdes hybrid genome assembly

Jit — Mon, 27 Nov 2017 08:05:40 -0600

When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.

Again, running spades.py will show you the options:

spades.py

This produces:

SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o 

Basic options:
-o          directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12          file with interlaced forward and reverse paired-end reads
-1            file with forward paired-end reads
-2            file with reverse paired-end reads
-s            file with unpaired reads
--pe<#>-12            file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1             file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2             file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s             file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-    orientation of reads for paired-end library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--s<#>                file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12            file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1             file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2             file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s             file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-    orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--hqmp<#>-12          file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1           file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2           file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s           file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-  orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--nxmate<#>-1         file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2         file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger              file with Sanger reads
--pacbio              file with PacBio reads
--nanopore            file with Nanopore reads
--tslr        file with TSLR-contigs
--trusted-contigs             file with trusted contigs
--untrusted-contigs           file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from      restart run with updated options and from the specified check-point ('ec', 'as', 'k', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset             file with dataset description in YAML format
-t/--threads               number of threads
                                [default: 16]
-m/--memory                RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir              directory for temporary files
                                [default: /tmp]
-k                 comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff             coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:

spades.py -t 4 \
          -m 32 \
          -k 31,51,71 \
          --only-assembler \
          -1 miseq.1.fastq -2 miseq.2.fastq \
          --nanopore minion.fastq \
          -o hybrid_assembly

In turn, these parameters mean

use 4 threads
max memory is 32Gb
use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71
only run the assembler, not the correction algorithm (for speed)
read 1 and read 2 of the MiSeq data
the nanopore data
put the output in folder “hybrid_assembly”

Postdoc positions in computational biology - Center for Genomic Science - Milan, Italy

Thu, 12 Dec 2013 18:34:47 -0600

Job Description: three postdoc positions in computational biology are available at the Center for Genomic Science in Milan (Italy):

- Development of computational methods to investigate the interplay between epigenetic and genetic layers and their role in tumor progression, by integrating genomic, epigenomic and transcriptional data. PI: Mattia Pelizzola (http://tiny.cc/comEpi)
- Epigenome and transcriptome analysis in mouse models of Hepatocellular Carcinoma. PI: Bruno Amati - Small and long non-coding RNAs in cancer stem cells. PI: Francesco Nicassio

All projects will benefit from the availability of both in-house and publicly available next-generation sequencing datasets. Familiarity with Linux environment, programming skills (especially in R) and a background in either computational biology, or physics/engineering/math will be advantageous.

Deadline for the application January 6th, to apply: http://genomics.iit.it/resources.html

Start date: March 1st, 2014

Duration: 1+2 years

Contact Person (Referent): Mattia Pelizzola

Ref. E-Mail: mattia.pelizzola@iit.it

Tel: 0039-02-94375058
Group Web Page: http://genomics.iit.it

jobTree based python wrapper to run the genome simulation tool suite Evolver

Jit — Fri, 08 Dec 2017 16:26:32 -0600

evolverSimControl (eSC) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (Newick format). In addition to simply running evolver, eSC also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (.maf) from leaf and branch nodes from a completed simulation and with the help of mafJoin, join them together into a single maf covering the entire simulation.

Address of the bookmark: https://github.com/dentearl/evolverSimControl

EMBO practical Course on "Bioinformatics and Genomes Analyses" at Hellenic Pasteur Institute, Athens, Greece

Sat, 21 Dec 2013 10:00:24 -0600

The main objectives of this Practical Course are to strengthen skills
of PhD students and young researchers in the domain of Bioinformatics
and Genome Data Analyses on the use of advanced fundamental algorithms
and their applications in genome studies.

The course topics will include theoretical and practical aspects in:
- Genomes comparisons,
- Evolutionary analyses (orthologs, paralogs and ancestral genomes
inference),
- RNAseq and Next Generation Sequencing (including algorithms, methods
and sequence mapping tools, data analyses and applications).

The course programme will be centred on theoretical presentations
followed by practical sessions. Practical sessions in a Linux
environment will involve Unix shell and Perl scripting. Participants
are assumed to be familiar with this environment.

A series of lectures delivered by prominent scientists on recent hot
topics in genome (Viruses, Prokaryotes, Eukaryotes) studies will be
included in the programme and future research perspectives will be
highlighted.

The topics that will be included in the course programme are similar
to those included in previously organized courses:http://www.pasteur.fr/~tekaia/BGA_courses.html

The course is aimed at motivated Ph.D students and Post-Doctoral
Researchers in Academic Institutions, with background in Mathematics,
Statistics, Biology or Computer Science and who are involved in
Bioinformatics and Genomes studies.

Selection of participants will be based on their background, running
research projects and on expressed motivations.
Selected students will have free accommodation and meals and are
expected to contribute with 200 euros and to pay for their travel
expenses.
All participants (students and invited speakers) will stay in the same
hotel.

Detailed indications are available on the course web site: http://events.embo.org/14-comparative-genomics/index.html

Candidates are advised to complete carefully the application form,
together with an abstract of at least one of their running projects, a
"one-page CV" and a personal Identity Picture (Photo).

The application deadline is March 14, 2014.

The organizers:
Menelaos Manoussakis, Hellenic Pasteur Institute, Athens, Greece.
Evdokia Karagouni, Hellenic Pasteur Institute, Athens - Greece.
Evie Melanitou, Institut Pasteur Paris - France.
Fredj Tekaia ( Institut Pasteur Paris France)
URL: http://www.pasteur.fr/~tekaia/BGA_courses.html

Date: 5 – 17 May, 2014.
More at http://events.embo.org/14-comparative-genomics/index.html
will take place in the ,

String graph based genome assembly software and tools !

Rahul Nayak — Tue, 19 Dec 2017 17:17:38 -0600

In graph theory, a string graph is an intersection graph of curves in the plane; each curve is called a "string". String graphs were first proposed by E. W. Myers in a 2005 publication. In recent Genome Research paper describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the paper is coauthored by Jared Simpson, the developer of ABySS assembler and Richard Durbin. iii) Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called ‘string graph’.

Following are the genome assembly tools based on string graph:

1.SGA (String Graph Assembler) https://github.com/jts/sga

Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.

2. SAGE: String-overlap Assembly of GEnomes https://github.com/lucian-ilie/SAGE2

SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.

3. FSG: Fast String Graph

The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.

4. BASE https://github.com/dhlbh/BASE

It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs. BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.

5. Fermi https://github.com/lh3/fermi/

Fermi is a de novo assembler with a particular focus on assembling Illumina short sequence reads from a mammal-sized genome. In addition to the role of a typical assembler, fermi also aims to preserve heterozygotes which are often collapsed by other assemblers. Its ultimate goal is to find a minimal set of unitigs to represent all the information in raw reads.

If you want to learn about String Graph assembler, please read the following papers -

i) The Fragment Assembly String Graph - E. W. Myers

This paper describes the String Graph concept.

ii) Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin

This earlier paper from Simpson and Durbin

iii) Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin

Oldest Hominin DNA Sequenced

Surajeet — Fri, 27 Dec 2013 19:58:31 -0600

Matthias Meyer and his team from the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, have developed new techniques for retrieving and sequencing highly degraded ancient DNA. They then joined forces with Juan-Luis Arsuaga and applied the new techniques to a cave bear from the Sima de los Huesos site. After this success, the researchers sampled two grams of bone powder from a hominin thigh bone from the cave. They extracted its DNA and sequenced the genome of the mitochondria or mtDNA, a small part of the genome that is passed down along the maternal line and occurs in many copies per cell. The researchers then compared this ancient mitochondrial DNA with Neandertals, Denisovans, present-day humans, and apes.

From the missing mutations in the old DNA sequences the researchers calculated that the Sima hominin lived about 400,000 years ago. They also found that it shared a common ancestor with the Denisovans, an extinct archaic group from Asia related to the Neandertals, about 700,000 years ago. "The fact that the mtDNA of the Sima de los Huesos hominin shares a common ancestor with Denisovan rather than Neandertal mtDNAs is unexpected since its skeletal remains carry Neandertal-derived features," says Matthias Meyer. Considering their age and Neandertal-like features, the Sima hominins were likely related to the population ancestral to both Neandertals and Denisovans. Another possibility is that gene flow from yet another group of hominins brought the Denisova-like mtDNA into the Sima hominins or their ancestors.

Reference

http://www.sciencedaily.com/releases/2013/12/131204132018.htm

MUMmer4: A fast and versatile genome alignment system

Jit — Sat, 03 Feb 2018 04:59:17 -0600

MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes;

Address of the bookmark: https://mummer4.github.io/

List of bioinformatics open source projects/software.

Rahul Nayak — Tue, 21 Jan 2014 14:28:37 -0600

Open source software is software that can be freely used, changed, and shared (in modified or unmodified form) by anyone. Open source software is made by many people, and distributed under licenses that comply with the Open Source Definition.The Open Source Initiative (OSI) is a global non-profit that supports and promotes the open source movement. Followings are the OS bioinformatics projects/software :

.NET Bio

http://blogs.msdn.com/b/msr_er/archive/2011/10/18/microsoft-biology-foundation-evolves-into-new-toolkit-net-bio.aspx

A language-neutral bioinformatics toolkit built using the Microsoft 4.0 .NET Framework to help developers, researchers, and scientists.

AMPHORA ("AutoMated Phylogenomic infeRence Application")

http://wolbachia.biology.virginia.edu/WuLab/Software.html

Metagenomics analysis software

Anduril

http://www.anduril.org/anduril/site/

Component-based workflow framework for data analysis

Armadillo workflow platform

Tool for designing and executing phylogenetic workflows

AutoDock

http://autodock.scripps.edu/

suite of automated docking tools

Biochemical Algorithms Library (BALL)

http://www.ball-project.org/

C++ library and framework for molecular modeling and visualization designed for rapid prototyping

Bio4j

http://bio4j.com/

Bio4j is a bioinformatics platform and graph based database built around most data available in UniProt KB(Swiss-Prot + TrEMBL), Gene Ontology (GO), UniRef (50,90,100), RefSeq, NCBI taxonomy, and Expasy Enzyme DB

Bioclipse

www.bioclipse.net

Visual platform for chemo- and bioinformatics based on the Eclipse Rich Client Platform (RCP).

Bioconductor

http://www.bioconductor.org/

R (programming language) language toolkit

Bioinformatics Learning Tutorial (BLT)

http://sourceforge.net/projects/biotutorial/

Educational interactive tutorials and 3D animations for Replication, Transcription, and Translation

BioHaskell

http://biohaskell.org/

Haskell (programming language)

BioJava

http://biojava.org/wiki/Main_Page

Java (programming language)

BioMOBY

http://biomoby.org/

registry of web services

BioPerl

http://www.bioperl.org/wiki/Main_Page

Perl language toolkit

BioPHP

http://www.biophp.org/

PHP language toolkit

Biopython

http://biopython.org/wiki/Main_Page

Python language toolkit

BioRails

https://github.com/biorails

a data management system designed to support researchers in drug discovery

BioRuby

http://bioruby.org/

Ruby language toolkit

BioSmalltalk

https://code.google.com/p/biosmalltalk/

Smalltalk language toolkit

BioUno

http://www.biouno.org/

BioUno is a project that applies Continuous Integration tools and techniques in Bioinformatics. It uses Jenkins and its plug-in API to create biology workflows and manage computer clusters.

caCORE

ontologic representation environment

caArray

https://cabig-stage.nci.nih.gov/community/tools/caArray

ontologic representation environment

EMBOSS

http://emboss.sourceforge.net/

Suite of packages for sequencing, searching, etc.

Gaggle

https://www.gaggle.net/

A framework for interoperability between systems biology software

Galaxy

http://galaxyproject.org/

Scientific workflow and data integration system

GenePattern

http://www.broadinstitute.org/cancer/software/genepattern/

Scientific workflow system that provides access to more than 150 genomic analysis tools

GeWorkbench

http://wiki.c2b2.columbia.edu/workbench/index.php/Home

Genomic data integration platform

GMOD

http://www.gmod.org/wiki/Main_Page

Toolkit for addressing many common challenges at biological databases.

GeneProf

http://www.geneprof.org/GeneProf/

A web-based, bioinformatics software suite for the analysis of functional genomics experiments, e.g. RNA-seq or ChIP-seq.

GeneTalk

http://www.gene-talk.de/

Tool for filtering sequence variants in VCF files. Network for scientists and clinicians for expertise and knowledge exchange. Database of annotations aboute sequence variants with clinically relevant information.

GenGIS

http://kiwi.cs.dal.ca/GenGIS/Main_Page

Application that allows users to combine digital map data with information about biological sequences collected from the environment.

GenomeSpace

http://www.genomespace.org/

Centralized web application that provides data format transformations and facilitates connections with other bioinformatics tools

GENtle

http://directory.fsf.org/wiki/GENtle

An equivalent to the proprietary Vector NTI, a tool to analyze and edit DNA sequence files

Integrated Genome Browser

http://bioviz.org/igb/

Java-based desktop genome browser

Integrative Genomics Viewer (IGV)

http://www.broadinstitute.org/igv/

High-performance desktop tool for interactive visual exploration of diverse genomic data

IntAct

http://www.ebi.ac.uk/intact/

molecular interaction database

InterMine

http://intermine.github.io/intermine.org/

Extensive data warehouse system for the analysis and integration of biological datasets

Java Treeview

http://jtreeview.sourceforge.net/

microarray data viewer

LabKey Server

http://labkey.com/

platform for integrating, analyzing and sharing data

OpenClinica

https://www.openclinica.com/

software for capturing and managing data in clinical trials

PromKappa

http://xbioinformatics.wordpress.com/tag/promkappa/

PromKappa (Promoter analysis by Kappa) software program used for promoter pattern generation and promoter analysis.

MeV: Multi-Experiment Viewer

http://www.tm4.org/mev.html

a desktop application for the analysis, visualization and data-mining of large-scale genomic data

PathVisio

http://www.pathvisio.org/

a desktop software for drawing, analysis and visualization of biological pathways

REDCRAFT

software for determining tertiary protein structure given assigned Residual Dipolar Coupling data

SAM Tools

Data format (SAM) and accompanying tool suite, for storing large nucleotide sequence alignments

Staden Package

Sequence assembly, editing and analysis, primarily consisting of gap4, gap5 and spin.

STAMP

Software package for analyzing metagenomic profiles that promotes ‘best practices’ in choosing appropriate statistical techniques and reporting results.

supraHex

An open-source R/Bioconductor package for omics data analysis using a supra-hexagonal map

Taverna workbench

Tool for designing and executing workflows

TGAC Browser

Genome Browser, visualisation solutions for big data in the genomic era

T-REX WebServer

Bioinformatics and phylogenetics webserver (NJ, PhyML, RAxML, MAFFT, MUSCLE, Newick viewer, Horizontal gene transfer detection, Reticulograms, Substitution models)

UGENE

integrated bioinformatics tools

Visomics

bioinformatics tools for omics data

Genome Analysis Toolkit 1.0 (GATK 1.0)

a software package to analyse next-generation resequencing data

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references

Manisha Mishra — Tue, 17 Apr 2018 16:21:20 -0500

AlignGraph is a software that extends and joins contigs or scaffolds by reassembling them with help provided by a reference genome of a closely related organism.

Using AlignGraph

AlignGraph --read1 reads_1.fa --read2 reads_2.fa --contig contigs.fa --genome genome.fa --distanceLow distanceLow --distanceHigh distancehigh --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa [--kMer k --insertVariation insertVariation --coverage coverage --part p --fastMap --ratioCheck --iterativeMap --misassemblyRemoval --resume]

Address of the bookmark: https://github.com/baoe/AlignGraph