BOL: Related items

quickmerge: A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.

Jit — Mon, 19 Dec 2016 10:23:36 -0600

quickmerge uses a simple concept to improve contiguity of genome assemblies based on long molecule sequences, often with dramatic outcomes. The program uses information from assemblies made with illumina short reads and PacBio long reads to improve contiguities of an assembly generated with PacBio long reads alone. This is counterintuitive because illumina short reads are not typically considered to cover genomic regions which PacBio long reads cannot. Although we have not evaluated this program for assemblies generated with Oxford nanopore sequences, the program should work with ONP-assemblies too.

Address of the bookmark: https://github.com/mahulchak/quickmerge

Genome Assembly Tools and Software - PART1 !!

Jit — Mon, 19 Dec 2016 18:09:22 -0600

The genome assemblers generally take a file of short sequence reads and a file of quality-value as the input. Since the quality-value file for the high throughput short reads is usually highly memory-intensive, only a few assemblers, best suited for your assembly. For the sake of computational memory saving and convenience of data inquiry, high-throughput short reads data is always initially formatted to specific data structure. Currently, existing data structure for this usage can be predominantly classified into two categories: string-based model and graph-based model.

We therefore list many genomle assembly tools here. We mainly reported for the assembly of genomes while the others are designed aiming at handling complex genomes.

TriMetAss 1.2 – The Trinity-based Iterative Metagenomics Assembler
- TriMetAss is an extension to the Trinity software [1], which can assemble select regions surrounding interesting features in metagenomic data. The software is particularly useful for very common and well-conserved genes (and – in theory – non-coding regions) that can occur in multiple contexts in the microbial community under study. It uses Vmatch [2] to extend seed reads (or contigs generated by another assembler) into longer contigs, by iteratively calling Vmatch and Trinity, until some stop criteria are met. Currently, TriMetAss lacks a thorough documentation, but you can direct questions to me if the README.txt file and the “-h” option is not sufficient to understand the software.
OMWare 1.0 – Efficient Assembly of Genome-wide Physical Maps
- The purpose of this Python module is help scientists use optical map data.
  Once complete, it will encapsulate and abstractify optical maps and their most common manipulations as they exist in a variety of formats.
LightAssembler – Lightweight Resources Assembly Algorithm
- Lightweight resources assembly algorithm for high-throughput sequencing reads.
  System requirements
  64-bit machine with g++ compiler or gcc in general, pthreads,and zlib libraries.
QUAST 4.1 – Quality Assessment Tool for Genome Assemblies
- QUAST evaluates genome assemblies.
  QUAST works both with and without a reference genome.
  The tool accepts multiple assemblies, thus is suitable for comparison.
DNA Baser 4.36 – DNA Sequence Assembly & Analysis
- DNA Sequence Assembler is revolutionary bioinformatics software for automatic DNA sequence assembly , DNA sequence analysis, contig editing, file format conversion and mutation detection.
COCACOLA – Binning Metagenomic Contigs using Sequence COmposition, Read CoverAge, CO-alignment, and Paired-end Read LinkAge
- COCACOLA: a general framework for binning contigs in metagenomic studies incorporating read COverage, CorrelAtion, sequence COmposition and paired-end read LinkAge
MaxBin 2.2 – Binning Assembled Metagenomic Sequences
- MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads.
GAML 0.1 – Genome Assembly by Maximum Likelihood
- GAML is a prototype genome assembly tool based on maximizing likelihood of the assembly in a model encompaasing error rate, insert length and other features of indvidual sequencing technologies. It can combine datasets produced by different technologies (currently Illumina, 454 and Pacific Biosciences).
NanoMark – DNA Assembly Benchmark for Nanopore long reads
- DNA Assembly Benchmark for Nanopore long reads
  A system for benchmarking DNA assembly tools, based on 3rd generation sequencers.
ARC 1.1.4-beta – Assembly by Reduced Complexity
- ARC is a pipeline which facilitates iterative, reference guided de novo assemblies with the intent of:
  1.Reducing time in analysis and increasing accuracy of results by only considering those reads which should assemble together.
  2.Reducing/removing reference bias as compared to mapping based approaches.
TransPS 1.1.0 – Transcriptome Post Scaffolding
- TransPS is a pipeline for post-processing of pre-assembled transcriptomes using reference based method. It applies an align-layout-consensus structure, consisting of three major stages. First, query sequences are aligned with a reference genome. Second, query sequences are ordered based on the alignment to the reference. Third, non-redundant sequences matched to the same gene of reference genome are scaffolded into one contig.
assemblyManager – Computing the Robotic Commands for 2ab Assembly
- Clotho provides persistence to such objects through relational databases that at least partially correspond the Clotho data model. Beyond database access and data model API support, Clotho Apps provide more specific functionality to Clotho such as viewing and editing data, running simulations, and automating various tasks. When thinking about Clotho Apps, an appropriate analogy would be Apps running on the Android operating system rather than the add-ons that extend the functionality of Firefox
BinPacker 1.1 – Packing-Based De Novo Transcriptome Assembly from RNA-seq Data
- BinPacker is a novel de novo assembler by modeling the transcriptome assembly problem as tracking a set of trajectories of items with their sizes representing coverage of their corresponding isoforms by solving a series of bin-packing problems
FermiKit 0.13 – De novo Assembly based Variant Calling pipeline for Illumina Short Reads
- FermiKit is a de novo assembly based variant calling pipeline for deep Illumina resequencing data. It assembles reads into unitigs, maps them to the reference genome and then calls variants from the alignment to an accuracy comparable to conventional mapping based pipelines (see evaluation in the tex directory). The assembly does not only encode SNPs and short INDELs, but also retains long deletions, novel sequence insertions, translocations and copy numbers
REPdenovo – A tool to Construct Repeats directly from Raw Reads
- REPdenovo is designed for constructing repeats directly from sequence reads. It based on the idea of frequent k-mer assembly. REPdenovo provides many functionalities, and can generate much longer repeats than existing tools. The overall pipeline is shown in the mannual file. REPdenovo supports the following main functionalities.
  1.Assembly. This step performs k-mer counting. Then we find frequent k-mers whose frequencies are over certain threshold. We then assemble these frequent k-mers into consensus repeats (in the form of contigs). Then we merge the constructed contigs to more completeness ones.
  2.Scaffolding. We use paired-end reads to connect repeat contigs into scaffolds, also provide the average coverage (indicates the copy number) for each constructed repeats.
Xander – Gene-targeted Metagenomic Assembler
- Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. We present a novel method for targeting assembly of specific protein-coding genes using a graph structure combining both de Bruijn graphs and protein HMMs. The inclusion of HMM information guides the assembly, with concomitant gene annotation.
SWAP-Assembler 2 – A scalable and fully parallelized Genome Assembler
- There is a growing gap between the output of new generation massively parallel sequencing machines and the ability to process and analyze the sequencing data. We present SWAP-Assembler, a scalable and fully parallelized genome assembler designed for massive sequencing data. Intend of using traditional de Bruijn Graph, SWAP-Assembler adopts multi-step bi-directed graph (MSG). With MSG, the standard genome assembly (SGA) is equivalent to the edge merging operations in a semi-group. Then a computation model, SWAP, is designed to parallelize semi-group computation. Experimental results showed that SWAP-Assembler is the fastest and most efficient assemblers ever, it can generated contigs with highest accuracy over all five selected assemblers and longest contig N50 in all selected parallel assemblers. Specially, in the scalability test, SWAP-Assembler can scales up to 1024 cores when processing Fish and Yanhuang dataset, and finishes the assembly work in only 15 and 29 minutes respecitively
TGNet – Visualization and Quality Assessment of de novo Genome Assemblies
- TGNet is a Cytoscape-based tool for visualization and quality assessment of de novo genome assemblies. Specifically it facilitates rapid detection of inconsistencies between a genome assembly and an independently derived transcriptome assembly.
Circlator 1.1.3 – A tool to Circularize Genome Assemblies
- A tool to circularize genome assemblies. The algorithm and benchmarks are described in the Genome Biology manuscript.
misFinder v0.4.05.05 – Identify Mis-assemblies in an unbiased manner using Reference and Paired-end Reads
- misFinder is a tool that aims to identify the assembly errors with high accuracy in an unbiased way and correct these errors at their mis-assembled positions to improve the assembly accuracy for downstream analysis. It combines the information of reference (or close related reference) genome and aligned paired-end reads to the assembled sequence. Structure variation and mis-assembly can be detected by comparing the reference genome and assembled sequence.
Scaffold_builder v2.2 – Order Contigs generated by draft sequencing along a Reference Sequence
- The abundance of repeat elements in genomes can impede the assembly of a single sequence. The tool Scaffold_builder was designed to generate scaffolds (super contigs of sequences joined by N-bases) using the homology provided by a closely related reference sequence. Scaffold_builder is an advanced wrapper for Nucmer, written in Python that resolves several situations that may arise when mapping contigs to the reference genome.
Rnnotator 3.5.0 – de novo Transcriptome Assembly pipeline from stranded RNA-Seq reads
- Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. Rnnotator is an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. The contigs produced by Rnnotator are highly accurate and reconstruct full-length genes when transcripts are sequenced sufficiently deep, roughly 30X for a given transcript. Rnnotator was designed to assemble Illumina single or paired-end reads. Rnnotator is also able to incorporate strand-specific RNA-Seq reads into the assembly in order to further improve the assembly.
SATRAP 0.2 – SOLiD Assembler TRAnslation Program
- A color space assembly must be translated into bases before applying bioinformatics analyses. SATRAP is designed to accomplish this important task adopting a very efficient strategy. The package integrates the Oases pipeline and several optimizations specifically designed for color space management. All steps of the pipeline allow to produce a SOLiD de novo transcriptome assembly and the subsequent color space translation. Alternatively, SATRAP can be used as a stand alone program to perform color space translation for either RNA-seq or DNA-seq SOLiD assemblies.
Bandage v0.7.1 – Navigating De novo Assembly Graphs Easily
- Bandage is a program for visualising de novo assembly graphs. By displaying connections which are not present in the contigs file, Bandage opens up new possibilities for analysing de novo assemblies.
HapCol 1.1.1 – Haplotype Assembly from Long Gapless Reads
- A fast and memory-efficient method for haplotype assembly from long gapless reads, like those produced by SMRT sequencing technologies (PacBio RS II) and Oxford Nanopore flow cell technologies (MinION).
REAGO 1.1 – REconstruct 16S ribosomal RNA Genes from MetagenOmic data
- an assembly tool for 16S ribosomal RNA recovery from metagenomic data
FGAP 1.8.1 – Automated Gap Closing tool
- FGAP aims to improve genome sequences by merging alternative assemblies or incorporating alternative data, analyzing the gap region and indicating the best sequence to close the gap.
DETONATE 1.10 – DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation
- DETONATE consists of two component packages, RSEM-EVAL and REF-EVAL. Both packages are mainly intended to be used to evaluate de novo transcriptome assemblies, although REF-EVAL can be used to compare sets of any kinds of genomic sequences.
Trinity 2.1.1 – RNA-Seq De novo Assembly
- Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.
IsoSCM 2.0.11 – Transcript Assembly tool using Multiple Change-point Inference to improve 3’UTR Annotation
- IsoSCM (Isoform Structural Change Model) is a new method for transcript assembly that incorporates change-point analysis to improve the 3′ UTR annotation process.
IVA 1.0.3 – Iterative Virus Assembler
- IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
SFA-SPA 0.2.1 – A Suffix Array based Short Peptide Assembler for Metagenomic Data
- SFA-SPA is a suffix array based short peptide assembler for metagenomic data
RAMPART 0.12.2 – A Workflow Management System for de novo Genome Assembly
- RAMPART is a de novo assembly pipeline that makes use of third party-tools and High Performance Computing resources. It can be used as a single interface to several popular assemblers, and can perform automated comparison and analysis of any generated assemblies
Celera Assembler 8.3 – Whole Genome Shotgun Assembler
- Celera Assembler (wgs-assembler) is scientific software for DNA research. It can reconstruct long sequences of genomic DNA given the fragmentary data produced by whole-genome shotgun sequencing. The Celera Assembler has enabled discovery in microbial genomes, large eukaryotic genomes, diploid genomes, and genomes from environmental samples. Celera Assembler contributed the first diploid sequence of an individual human, and metagenomics assemblies of the Global Ocean Sampling
A5-miseq 20150522 – de novo Assembly & Analysis of Illumina Sequence data
- de novo assembly & analysis of Illumina sequence data, including the A5 pipeline, A5-miseq, tools to evaluate assembly quality, and scripts to facilitate data submission to NCBI and the RAST annotation system
Trans-ABySS 1.5.3 – Analyze ABySS multi-k-assembled Shotgun Transcriptome Data.
- Trans-ABySS is a software pipeline for analyzing ABySS-assembled contigs from shotgun transcriptome data. The pipeline accepts assemblies that were generated across a wide range of k values in order to address variable transcript expression levels. It first filters and merges the multi-k assemblies, generating a much smaller set of nonredundant contigs. It contains scripts that map assembled contigs to known transcripts, currently supporting Blat and Exonerate contig-to-genome aligners. It identifies novel splicing events like exon-skipping, novel exons, retained introns, novel introns, and alternative splicing. Its scripts can also estimate gene expression levels, identify candidate polyadenylation sites, and identify candidate gene-fusion events.
SAT-Assembler 20160120 – Scalable and Accurate Targeted Gene Assembly Tool
- SAT-Assembler can perform targeted gene assembly for both RNA-Seq and metagenomic data. It addresses the above challenges of de novo assembly of large scale NGS data by conducting family-specic gene assembly, homology-guided overlap graph construction, and careful graph traversal.
Opera 2.0.2 – Sequence Assembly Program
- Opera (Optimal Paired-End Read Assembler) is a sequence assembly program . It uses information from paired-end reads to optimally order and orient contigs assembled from shotgun-sequencing reads.
Sequencher 5.4.1 – DNA Sequence Assembly and Analysis
- Sequencher is the industry standard software for DNA sequence analysis. It works with all automated sequencers and is widely known for its lightning-fast contig assembly, short learning curve, user-friendly editing tools, and superb technical support. First released almost 15 years ago, Sequencher is currently used for sequence analysis tasks in every major genomic and pharmaceutical company as well as numerous academic and government labs in over 40 countries around the world. Life Science researchers use Sequencher for many diverse DNA sequence analysis applications including de novo gene sequencing, mutation detection, forensic human identification, systematics, and more.
Minia 2.0.3 – Short-read Assembler based on a de Bruijn graph
- Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day
MaSuRCA 3.1.3 – Whole Genome Short Read Assembler
- MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).
KmerGenie 1.6982 – K-mer size Selection for Genome Assembly
- KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie’s choices lead to assemblies that are close to the best possible over all k-mer lengths.
pilon v1.16 – Automated Assembly Improvement
- pilon uses read alignment analysis to diagnose, report, and automatically improve de novo genome assemblies.
Phred/Phrap/Consed 29.0 – DNA Sequence Assembler & Finishing Tools
- phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets.
CLC Genomics Workbench 8.5.1 – Assembly & Analysis of Sequencing Data
- CLC Genomics Workbench, for analyzing and visualizing Next Generation Sequencing data, incorporates cutting-edge technology and algorithms, while also supporting and integrating with the rest of your typical NGS workflow.
Metassembler 1.5 – Combines multiple Whole Genome de novo Assemblies into a combined Consensus Assembly
- Metassembler is a software package for reconciling assemblies produced by de novo short-read assemblers such as SOAPdenovo and ALLPATHS-LG. The goal of assembly reconciliation, or “metassembly,” is to combine multiple assemblies into a single genome that is superior to all of its constituents
Tablet 1.15.09.01 – Next Generation Sequence Assembly Visualization
- Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32-bit desktop machine.
ABySS 1.9.0 – de novo, parallel, paired-end Sequence Assembler
- ABySS (Assembly By Short Sequences) is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
CLEAT 2.0 – Identifies 3′ UTR Ends of Transcripts in de novo RNA-Seq Assemblies
- CLEAT is a post-processing tool for CLEavage site Analysis of Transcriptomes. CLEAT is designed to work on trans-ABySS output.
StriDe – novel Assembler
- The StriDe Assembler integrates string and de Bruijn graph by decomposing reads within error-prone regions, while extending paire-end read into long reads for assembly through repetitive regions.
REAPR 1.0.18 – Genome Assembly Evaluation
- REAPR (Recognising Errors in Assemblies using Paired Reads) is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis-assemblies and other warnings, and produces a new broken assembly based on the error calls.
GapFiller 1.10 – Close Gaps within Pre-assembled Scaffolds
- GapFiller is a stand-alone program for closing gaps within pre-assembled scaffolds. It is unique in offering the possibility to manually control the gapclosure process. By using the distance information of paired-read data, GapFiller seeks to close the gap from each edge in an iterative manner. From a good number of tests we see the program yields excellent results both on bacterial en eukaryotic datasets. The command-line Perl script and additional files van be downloaded below. The input data is given by pre-assembled scaffold sequences (FASTA) and NGS paired-read data (FASTA or FASTQ).
SSAKE 3.8.4 – Assembling Millions of short DNA Sequences
- SSAKE is a genomics application for assembling millions of very short DNA sequences.SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets.
SGA 0.10.14 – String Graph Assembler
- SGA is a de novo assembler designed to assemble large genomes from high coverage short read data. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.
r2cat – Synteny Plots & Comparative Assembly
- r2cat (related reference based contig arrangement tool) can be used to order a set of contigs with respect to a single reference genome. This is done by mapping the contigs onto the reference using a q-gram filter. The mapping is visualized in a synteny plot.
TASR 1.6 – Targeted Assembly of Sequence Reads
- TASR (Targeted Assembly of Sequence Reads) is a genomics application that allows hypothesis-based interrogation of genomic regions (sequence targets) of interest.
Rainbow v2.0.4 – Clustering and Assembling Short Reads, especially for RAD
- Rainbow package consists of several programs used for RAD-seq related clustering and de novo assembly.
CAFTOOLS 2.0.2 – Tools for the Common Assembly Format (CAF)
- CAFTOOLS comprises a set of libraries and programs for manipulating DNA sequence assemblies using CAF files, a comprehensive representation of a sequence assembly as a text file.
Gap Resolution – Improving Newbler Genome Assemblies. Gap Resolution was developed by DOE Joint Genome Institute to improve Newbler genome assemblies by automating the closure of sequence gaps caused by repetitive regions in the DNA.
Meraculous 2.0.5 – De novo Genome Assembler from Short Reads
- Meraculous is a new algorithm for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis.
COPE 1.2.5 – Pair-end Reads Connection tool to facilitate Genome Assembly
- COPE (Connecting Overlapped Pair-End reads) is a method to align and connect the illumina sequenced Pair-End reads of which the insert size is smaller than the sum of the two read length.The connected reads can be used in genome assembly, resequencing and transcriptome research.
PEAR 0.9.6 – Pair-End reads AssembleR
- PEAR is an ultrafast, memory-efficient and highly accurate pair-end reads assembler. It is fully parallelized and can run with as low as just a few kilobytes of memory.
EBARDenovo 2.0.1 – Highly-accurate de novo Assembler of Paired-end RNA-Seq
- EBARDenovo is a highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.
EagleView 2.2 – Genome Assembler Viewer
- EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations. It provides an easy way for inspecting visually the quality of a genome assembly and validating polymorphism candidate sites (e.g., SNPs) reported by polymorphism discovery tools. It can also facilitate data interpretation and hypothesis generation.
MAIA 0.5 – Integrating Genome Assemblies
- MAIA (Multiple Assembly IntegrAtion) is an algorithm to integrate multiple genome assemblies. For example, assemblies originating from:
  – Different runs of a de novo assembler
  – Assemblies of different data types
  – Comparative assemblies
InteMAP 1.0 – Integrated Metagenomic Assembly pipeline for NGS Short Reads
- InteMAP is a pipeline which integrates individual assemblers for assembling metagenomic short sequencing reads.
MAP 20121108 – A de novo Metagenomic Assembly program for Shotgun DNA reads
- MAP (Metagenomic Assembly program) is a de novo assembly approach and its implementation based on an improved Overlap/Layout/Consensus (OLC) strategy incorporated with several special algorithms.MAP uses the mate pair information, resulting in being more applicable to shotgun DNA reads (recommended as > 200 bp) currently widely-used in metagenome projects. Results of extensive tests on simulated data show that MAP can be superior to both Celera and Phrap for typical longer reads by Sanger sequencing, as well as has an evident advantage over Celera, Newbler, and the newest Genovo, for typical shorter reads by 454 sequencing.
Phusion 2.1c – Assembly Genome Sequences from Whole Genome Shotgun(WGS) Reads
- Phusion is a software package for assembling genome sequences from whole genome shotgun(WGS) reads.
CodonCode Aligner 6.0.2 – DNA Sequence Assembly & Alignment
- CodonCode Aligner is a program for sequence assembly, contig editing, and mutation detection, available for Windows and Mac OS X. Aligner is compatible with Phred-Phrap and fully supports sequence quality scores, while offering a familiar, easy-to-learn user interface.
Cerulean 0.1.1 – Hybrid Genome Assembler
- Cerulean is a hybrid assembly using high throughput short and long reads
Ragout 1.2 – Tool for Reference-assisted Assembly
- Ragout (Reference-Assisted Genome Ordering UTility) is a tool for assisted assembly using multiple references. It takes a short read assembly (a set of contigs), a set of related references and a corresponding phylogenetic tree and then assembles the contigs into scaffolds.
laSV 1.0.2 – Local Assembly based Structural Variation Discovery tool
- laSV is a software that employs a local de novo assembly based approach to detect genomic structural variations from whole-genome high-throughput sequencing datasets.
SPAdes 3.6.2 – Single-cell Genome Assembler
- SPAdes (St. Petersburg genome assembler) is intended for both standard isolates and single-cell MDA bacteria assemblies.
PERGA 0.5.03.02 – Paired End Reads Guided Assembler
- PERGA is a novel sequence reads guided de novo assembly approach which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds.
Telescoper 0.2 – De novo Assembly Algorithm
- Telescoper is a local assembly algorithm designed for short-reads from NGS platforms such as Illumina. The reads must come from two libraries: one short insert, and one long insert.
MetaCompass 1.0 – Comparative Assembly of Metagenomic Sequences
- MetaCompass is a software package for comparative assembly of metagenomic reads. MetaCompass achieves comparable assembly performance to the state of the art de novo assemblers, but these two different approaches complement each other a lot. So combining contigs between MetaCompass and other independent de novo assemblers give us the best overall metagenomic assembly.
SCARF – Scaffolded and Corrected Assembly of Roche 454
- SCARF is a next-gen sequence assembly tool for evolutionary genomics. Designed especially for assembling 454 EST sequences against high quality reference sequences from related species.
MetaCAA – Assembly of Metagenomic Datasets
- MetaCAA is a sequence-assembly tool specifically intended for metagenomes.
Contiguity 1.0.4 – Contig Adjacency Graph Construction and Visualisation
- Contiguity is interactive software for the visualization and manipulation of de novo genome assemblies.
ScaffoldScaffolder 0.1 – Solving Contig Orientation via Bidirected to Directed Graph Reduction
- ScaffoldScaffolder is a stand-alone scaffolding algorithm which was designed specifically for scaffolding diploid genomes.
HaploClique 0.1 – Viral Quasispecies Assembly from Paired-end data
- HaploClique is a computational approach to reconstruct the structure of a viral quasispecies from next-generation sequencing data as obtained from bulk sequencing of mixed virus samples.
TAG 0.91 – Transcript Assembly by Mapping Reads to Graphs
- TAG is a tool for metatranscriptome assembly using de Bruijn graph of matched metagenome as the reference
EPGA2 – De Novo Assembler
- EPGA2 updates some modules in EPGA which can improve memory efficiency in genome asssembly.
GMcloser 1.5.1 / GMvalue 1.3 – Closing the Gaps in Scaffolds with Preassembled Contigs
- GMcloser fills and closes the gaps present in scaffold assemblies, especially those generated by the de novo assembly of whole genomes with next-generation sequencing (NGS) reads.
SLICEMBLER – Meta-assembler Designed for Ultra-deep Sequencing data
- SLICEMBLER is a meta-assembler designed for ultra-deep sequencing data
SEQLandscape v1 – Generation and Visualization of Sequence Landscape
- SEQLandscape is an application allowing the generation and visualization of a sequence landscape. HyDA-Vista: Towards Optimal Guided Selection of k-mer Size for Sequence Assembly.
misSEQuel v1.0beta – Misassembly Detection in Draft Genomes
- misSEQuel is a software that enhances the quality of draft genomes by identifying misassembly errors and their breakpoints using paired-end sequence reads and optical mapping data.
Dawg 1.2 – Simulating Sequence Evolution
- Dawg (DNA Assembly with Gaps) is an application designed to simulate the evolution of recombinant DNA sequences in continuous time based on the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of gap formation.
BUSCO v1.1b1 – Assessing Genome Assembly and Annotation Completeness with Single-copy Orthologs
- BUSCO completeness assessment employs sets of Benchmarking Universal Single-Copy Orthologs from OrthoDB to provide quantitative measures of the completeness of genome assemblies, annotated gene sets, and transcriptomes in terms of expected gene content.
FinisherSC 2.0 – A Repeat-aware tool for upgrading de-novo Assembly using Long Reads
- FinisherSC is a repeat-aware and scalable tool for upgrading de-novo assembly using long reads.
WhatsHap – Haplotype Assembly for Future-Generation Sequencing Reads
- WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called haplotype assembly. It is especially suitable for long reads, but works also well with short reads.
Compartmentalized Assembler – Assembly of Physical Maps
- Compartmentalized assembler is a novel method for the assemlby of high quality physical maps from fingerprinted clones.
Elviz – Exploration of Metagenomic Assemblies
- Elviz (Environmental Laboratory Visualization) is an interactive web-based tool for the visual exploration of assembled metagenome data and their complex metadata.
SSP – de novo Transcriptome Assembler
- SSP is a de novo transcriptome assembler that assembles RNA-seq reads into transcripts. SSP aims to reconstructs all the alternatively spliced isoforms and estimates the expression level of them.
VirAmp – Galaxy-based Viral Genome Assembly pipeline
- VirAmp is a web-based semi-de novo fast virus genome assembly pipeline designed for extremely high coverage NGS data. VirAmp is a collection of existing tools, combined into a single Galaxy interface. Users without further computational knowledge can easily operate the pipeline.
aTRAM 1.04 – automated Target Restricted Assembly Method
- aTRAM performs targeted de novo assembly of loci from paired-end Illumina runs.
Ray 2.3.1 – Parallel Genome Assemblies for Parallel DNA sequencing
- Ray is a parallel software that computes de novo genome assemblies with next-generation sequencing data.
CAR – Contig Assembly of Prokaryotic Draft Genomes Using Rearrangements
- CAR is an efficient and more accurate tool for assembling contigs of a prokaryotic draft genome based on a reference genome.
VTBuilder – Assembly of Multi Isoform Transcriptomes
- VTBuilder is a tool for the inference of non-chimeric contigs from read data that has been sequenced from complex multi-isoformic transcriptomes, such as snake venom glands, or rapidly evolving viral populations, such as HIV-1.
TruHmm – TRanscription Unit Assembly by a Hidden Markov model
- TruHmm is a reference based transcriptome assembler for prokaryotes, and is suitable for assembling transcripts for directional RNA-seq library.
Bridger 20141201 – RNA-Seq Assembly
- Bridger is a new de novo transcriptome assembler which takes advantage of techniques employed in Cufflinks to overcome limitations of the existing de novo assemblers.
GRASP 0.0.4 – Guided Reference-based Assembly of Short Peptides
- GRASP is a gene annotation tool for metagenomic studies. GRASP assembles the fragmented short-peptides, which are called from the NGS reads, and aligns the assembled contigs to the query reference protein. GRASP achieves much higher sensitivity than BLASTP for gene annotation purpose.
Cortex 1.05.21 – Genome Assembly and Variation Analysis
- Cortex is an efficient and low-memory software framework for analysis of genomes using sequence data. There are two main executables, being developed in parallel streams: cortex_con (primary contact Mario Caccamo) is for consensus genome assembly, and cortex_var (primary contact Zamin Iqbal) is for variation and population assembly.
MEGAHIT v0.1.4 – Large and Complex Metagenomics Assembly via Succinct de Bruijn graph
- MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph to achieve low memory usage, whereas its goal is not to make memory usage as low as possible.
CISA 20140304 – Contig Integrator for Sequence Assembly
- CISA has been developed to integrate the assemblies into a hybrid set of contigs, resulting in assemblies of superior contiguity and accuracy, compared with the assemblies generated by the state-of-the-art assemblers and the hybrid assemblies merged by existing tools
Cufflinks 2.2.1 – Transcript Assembler & Abundance Estimator for RNA-Seq
- Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.
mapsembler 2.2.4 – Targetted Assembly of Short Sequence Reads
- Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.
Tedna 1.2.2 – Transposable Element De Novo Assembler
- Tedna is a lightweight de novo transposable element assembler. It assembles the transposable elements directly from the raw reads.
HyDA 1.3.1 / Squeezambler 2.0.3 – Hybrid De Novo Assembler
- HyDA is a multipurpose assembler, particularly tested for single cell and normal multicell genome co-assembly
PANDASEQ 2.8 / Pandaseq-sam 1.3 – PAired-eND Assembler for DNA sequences
- PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
ZORRO 2.2 – Hybrid Sequencing Technology Assembler
- ZORRO is a hybrid sequencing technology assembler. It merges two sets of pre-assembled contigs into a more contiguous and consistent assembly.
FLASH 1.2.11 – Fast Length Adjustment of SHort reads
- FLASH (Fast Length Adjustment of SHort reads) is a very accurate fast tool to merge paired-end reads from fragments that are shorter than twice the length of reads. The extended length of reads has a significant positive impact on improvement of genome assemblies.
ALLPATHS-LG 51750 – Whole Genome Shotgun Assembler
- ALLPATHS-LG (Large Genome) is a whole genome shotgun assembler that can generate high quality assemblies from short reads. It works on both small and large (mammalian size) genomes. To use it, you should first generate ~100 base Illumina reads from two libraries: one from ~180 bp fragments, and one from ~3000 bp fragments, both at about 45x coverage. Sequence from longer fragments will enable longer-range continuity.
More Tools at http://bioinformaticsonline.com/pages/view/30440/genome-assembly-tools-and-software-part2

PANDASEQ

Shruti Paniwala — Mon, 23 Jan 2017 04:54:32 -0600

PANDASEQ assembles paired-end Illumina reads into sequences, trying to correct for errors and uncalled bases. The assembler reads two files in FASTQ format with quality information. If amplification primers were used (e.g., to isolate a variable region of the 16S gene, or the constant regions around zinc finger binding residues), they can be removed from the sequence during assembly. The final sequence will correct any uncalled bases in the overlapping region using the complementary strand. When mismatches occur in the overlapping region, the base with the better quality score is chosen.
The algorithm is as follows:

1.Find the positions where the forward and reverse primers match best above the threshold and discard the ends of the sequence, including the primer.
2.Pick and overlap to maximise the probability of the forward and reverse reads having come from a single piece of DNA.
3.Identify the masking of the end of the read with the quality score B or # as done by CASAVA and adjust the probabilities in this region.
4.Construct an assembled sequence between the primers and calculate the quality.
5.Check for various constraints, including quality, length, uncalled bases, and user-supplied modules.

http://neufeldserver.uwaterloo.ca/~apmasell/pandaseq_man1.html

Address of the bookmark: http://neufeldserver.uwaterloo.ca/~apmasell/pandaseq_man1.html

Bioinformatics Algorithms

Jitendra Narayan — Tue, 16 Jul 2013 03:35:15 -0500

An algorithm is a computable set of steps to achieve a desired result.

We use algorithms every day. For example, a recipe for baking a cake is an algorithm. Most programs, with the exception of some artificial intelligence applications, consist of algorithms. Inventing elegant algorithms -- algorithms that are simple and require the fewest steps possible -- is one of the principal challenges in programming. An algorithm is a description of a procedure which terminates with a result. In other words an algorithm is a set of instructions, sometimes called a procedure or a function, that is used to perform a certain task. This can be a simple process, such as adding two numbers together, or a complex function, such as adding effects to an image. For example, in order to sharpen a digital photo, the algorithm would need to process each pixel in the image and determine which ones to change and how much to change them in order to make the image look sharper.

In mathematics, computer science, and related subjects, an algorithm is an effective method for solving a problem using a finite sequence of instructions. Algorithms are used for calculation, data processing, and many other fields.
Each algorithm is a list of well-defined instructions for completing a task. Starting from an initial state, the instructions describe a computation that proceeds through a well-defined series of successive states, eventually terminating in a final ending state. The transition from one state to the next is not necessarily deterministic; some algorithms, known as randomized algorithms, incorporate randomness.

History

The origin of the term comes from the ancients. The concept becomes more precise with the use of variables in mathematics. Algorithm in the sense of what is now used by computers appeared as soon as first mechanical engines were invented.
The word algorithm comes from the name of the 9th century Persian Muslim mathematician Abu Abdullah Muhammad ibn Musa Al-Khwarizmi. The word algorism originally referred only to the rules of performing arithmetic using Hindu-Arabic numerals but evolved via European Latin translation of Al-Khwarizmi's name into algorithm by the 18th century. The use of the word evolved to include all definite procedures for solving problems or performing tasks.
The algorithm of Archimedes gives an approximation of the Pi number.
Eratosthenes has defined an algorithim for retrieving prime numbers.
Averroès (1126-1198) was using algorithmic methods for calculations.
Adelard de Bath (12 th) introduces the algorismus term, from Al-Khwarizmi.
During the 1800's up to the mid-1900's:

- George Boole (1847) has invented the binary algebra, the basis of computers. Actually he has unified logic and calculation in a common symbolism.

- Gottlob Frege (1879) formula language's, that is a lingua characterica, a language written with special symbols, "for pure thought", that is free from rhetorical embellishments... constructed from specific symbols that are manipulated according to definite rules.

- Giuseppe Peano (1888) It's The principles of arithmetic, presented by a new method was the first attempt at an axiomatization of mathematics in a symbolic language.

- Alfred North Whitehead and Bertrand Russell in their Principia Mathematica (1910-1913) has further simplified and amplified the work of Frege.

- Kurt Goëdel (1931) cites the paradox of the liar that completely reduces rules of recursion to numbers.

The concept of algorithm was formalized in 1936 through Alan Turing's Turing machines and Alonzo Church's lambda calculus, which in turn formed the foundation of computer science.
Stephen C. Kleene (1943) defined his now-famous thesis known as the "Church-Turing Thesis". In this context:

" Algorithmic theories... In setting up a complete algorithmic theory, what we do is to describe a procedure, performable for each set of values of the independent variables, which procedure necessarily terminates and in such manner that from the outcome we can read a definite answer, "yes" or "no," to the question, "is the predicate value true?"

Classification

Classification by purpose

Each algorithm has a goal, for example, the purpose of the Quick Sort algorithm is to sort data in ascending or descending order. But the number of goals is infinite, and we have to group them by kind of purposes:

Classification by implementation

An algorithm may be implemeted according to different basical principles.

Recursive or iterative

A recursive algorithm is one that calls itself repeatedly until a certain condition matches. It is a method common to functional programming.
Iterative algorithms use repetitive constructs like loops.
Some problems are better suited for one implementation or the other. For example, the towers of hanoi problem is well understood in recursive implementation. Every recursive version has an iterative equivalent iterative, and vice versa.

Logical or procedural

An algorithm may be viewed as controlled logical deduction.
A logic component expresses the axioms which may be used in the computation and a control component determines the way in which deduction is applied to the axioms.
This is the basis of the logic programming. In pure logic programming languages the control component is fixed and algorithms are specified by supplying only the logic component.

Serial or parallel

Algorithms are usually discussed with the assumption that computers execute one instruction of an algorithm at a time. This is a serial algorithm, as opposed to parallel algorithms, which take advantage of computer architectures to process several instructions at once. They divide the problem into sub-problems and pass them to several processors. Iterative algorithms are generally parallelizable. Sorting algorithms can be parallelized efficiently.

Deterministic or non-deterministic

Deterministic algorithms solve the problem with a predefined process whereas non-deterministic algorithm must perform guesses of best solution at each step through the use of heuristics.

Classification by design paradigm

A design paradigm is a domain in research or class of problems that requires a dedicated kind of algorithm:

Divide and conquer

A divide and conquer algorithm repeatedly reduces an instance of a problem to one or more smaller instances of the same problem (usually recursively), until the instances are small enough to solve easily. One such example of divide and conquer is merge sorting. Sorting can be done on each segment of data after dividing data into segments and sorting of entire data can be obtained in conquer phase by merging them.
The binary search algorithm is an example of a variant of divide and conquer called decrease and conquer algorithm, that solves an identical subproblem and uses the solution of this subproblem to solve the bigger problem.

Dynamic programming

The shortest path in a weighted graph can be found by using the shortest path to the goal from all adjacent vertices.
When the optimal solution to a problem can be constructed from optimal solutions to subproblems, using dynamic programming avoids recomputing solutions that have already been computed.
- The main difference with the "divide and conquer" approach is, subproblems are independent in divide and conquer, where as the overlap of subproblems occur in dynamic programming.
- Dynamic programming and memoization go together. The difference with straightforward recursion is in caching or memoization of recursive calls. Where subproblems are independent, this is useless. By using memoization or maintaining a table of subproblems already solved, dynamic programming reduces the exponential nature of many problems to polynomial complexity.

The greedy method

A greedy algorithm is similar to a dynamic programming algorithm, but the difference is that solutions to the subproblems do not have to be known at each stage. Instead a "greedy" choice can be made of what looks the best solution for the moment.
The most popular greedy algorithm is finding the minimal spanning tree as given by Kruskal.

Linear programming

The problem is expressed as a set of linear inequalities and then an attempt is made to maximize or minimize the inputs. This can solve many problems such as the maximum flow for directed graphs, notably by using the simplex algorithm.
A complex variant of linear programming is called integer programming, where the solution space is restricted to all integers.

Reduction also called transform and conquer

Solve a problem by transforming it into another problem. A simple example: finding the median in an unsorted list is first translating this problem into sorting problem and finding the middle element in sorted list. The main goal of reduction is finding the simplest transformation possible.

Using graphs

Many problems, such as playing chess, can be modeled as problems on graphs. A graph exploration algorithms are used.
This category also includes the search algorithms and backtracking.

The probabilistic and heuristic paradigm

Probabilistic

Those that make some choices randomly.

Genetic

Attempt to find solutions to problems by mimicking biological evolutionary processes, with a cycle of random mutations yielding successive generations of "solutions". Thus, they emulate reproduction and "survival of the fittest".

Heuristic

Whose general purpose is not to find an optimal solution, but an approximate solution where the time or resources to find a perfect solution are not practical.

Classification by complexity

Some algorithms complete in linear time, and some complete in exponential amount of time, and some never complete.

Algorithms resources on net.

Graph Algorithms in Bioinformatics

Bioinformatics Algorithms Description

Bioinformatics Algorithms Course Page

Bioinformatics Algorithm Demonstrations

Introduction to Bioinformatics Algorithms Lectures 1-2 by Dr. Max Alekseyev USC, 2009

Online Lectures on Bioinformatics

Sequence Alignment Algorithms

Algorithm for sequence alignment: dynamic programming

Network Protocol Analysis using Bioinformatics Algorithms

Bioinformatics Algorithms Links

Dynamic Programming

Particularly good sites...

•http://www.cis.upenn.edu/~sahuguet/MSA/
•http://www.blc.arizona.edu/courses/bioinformatics/align.html
•http://www.cs.monash.edu.au/~lloyd/tildeStrings/Notes/DPA.html
•http://www.cs.orst.edu/~schut/cs325/dynamic.htm
•http://www.catalase.com/dprog.htm
•http://bioweb.ncsa.uiuc.edu/~bioph490/BIOPH2.html#SEQUENCE_COMP
•http://www.qucis.queensu.ca/home/cisc365/javascript/dp1/index.html
Other sites...
•http://bioweb.ncsa.uiuc.edu/~bioph490/dynamic_programming_demo.html
•http://www.qucis.queensu.ca/home/cisc365/365overheads.html
•http://www.qucis.queensu.ca/home/cisc365/dp/dp.p01.html
•http://www.dgp.toronto.edu/csc270/tut_dp.html
•http://queue.ieor.berkeley.edu/~jshu/knapsack/DP/dp.html
•http://mat.gsia.cmu.edu/classes/dynamic/dynamic.html
•http://www.cs.sandia.gov/~scistra/class_3
•http://levine.sscnet.ucla.edu/Econ101/dynamic.htm
•http://mat.gsia.cmu.edu/classes/stoch_dynamic/stoch_dynamic.html
•http://mat.gsia.cmu.edu/classes/dynamic/node8.html
•http://www.maths.mu.oz.au/~moshe/dp/bibl/bibliography.html
•http://cartan.gmd.de/PAPER/ismb95/ismb_html.html
•http://screwdriver.bu.edu/bibliography/dynamic_programming.htm
•http://www.norvig.com/design-patterns/
•http://tome.cbs.univ-montp1.fr/htmltxt/Doc/manual/node137.html
•http://poem.princeton.edu/~verdu/dynamic.html
•http://www.orca1.com/opushelpweb/opusDynamic_Programming.html
•http://screwdriver.bu.edu/cn760-lectures/l7/index.htm
•http://www.ms.unimelb.edu.au/~moshe/dp/dp.html
•http://mat.gsia.cmu.edu/ORCS/0255.html
•http://aae.wisc.edu/e703/notes/a13dynpr.htm
•http://bioweb.pasteur.fr/docs/modeller/node137.html
•http://www2.uwindsor.ca/~lama/my470/ddynamic.htm
•http://students.ceid.upatras.gr/~papagel/project/ex5_6_1.htm
•http://www.cs.sunysb.edu/~algorith/lectures-good/node12.html
•http://www.cs.sunysb.edu/~algorith/lectures-good/node12.html
•http://www.utdallas.edu/~scniu/documents/7315.htm
•http://www.ii.uib.no/~pinar/seminar/larry.html
•http://www.deakin.edu.au/~gecole/books.html
•http://www.cseg.engr.uark.edu/~wessels/algs/notes/dynamic.html
•http://www.csc.liv.ac.uk/~ped/teachadmin/algor/dyprog.html
•http://www.eli.sdsu.edu/courses/fall96/cs660/notes/dynamicProg/dynamicProg.html
•http://www.cs.indiana.edu/l/www/ftp/techreports/TR514.html
•http://www.cs.brandeis.edu/~mairson/poems/node3.html
•http://www.cis.tu-graz.ac.at/igi/oaich/animations/Dynamic2.html
•http://bioweb.ncsa.uiuc.edu/~workshop/

Smith Waterman
•http://genome-www.stanford.edu/Saccharomyces/help/sw_alignment.html
•http://genome-www.stanford.edu/Saccharomyces/help/sw_details.html
•http://www.stanford.edu/~sntaylor/bioc218/final.htm
•http://www.maths.tcd.ie/~lily/pres2/sld009.htm
•http://bioweb.ncsa.uiuc.edu/~workshop/Lab_3/Smith-Waterman.htm
•http://www.tigem.it/LOCAL/SW/threshold.html
•http://sgbcd.weizmann.ac.il/genweb/help/smith-waterman.html
•http://cbrg.ethz.ch/ServerBooklet/section2_3_5.html
Needleman & Wunsch
•http://www.maths.tcd.ie/~lily/pres2/sld003.htm
•http://acer.gen.tcd.ie/~amclysag/nwswat.html
•http://www.nada.kth.se/~erikw/thesis/chapter2_3.html
•http://www.irbm.it/irbm-course95/gb/docs/amps/subsection3_6_1.html
•http://www.ibc.wustl.edu/~zuker/Bio-5495/align-html/node3.html

General (NW vs. SW vs. HMM, etc.)

•http://www.maths.tcd.ie/~lily/pres2/
•http://acer.gen.tcd.ie/~amclysag/nwswat.html
•http://laguerre.psc.edu/biomed/TUTORIALS/SEQUENCE/MULTIPLE/tutorial.html
•http://www.cse.ucsc.edu/research/compbio/

Hmms

•http://www.medmicro.mds.qmw.ac.uk/HMMER/main.html
•http://alfredo.wustl.edu/ismb96/abs/p02.html
•http://www.cse.ucsc.edu/research/compbio/html_format_papers/hughkrogh96/cabios.html
•http://wwwsyseng.anu.edu.au/~jason/hmmlinks.html
•http://www.breadfan.com/markov.html
•http://cslu.cse.ogi.edu/HLTsurvey/ch1node34.html
•http://www.ibc.wustl.edu/service/hmmalign/glocal.html
•http://www.cse.ucsc.edu/research/compbio/html_format_papers/ismb94/node5.html
•http://www.iscs.nus.edu.sg/~luakt/ic3222/lecture/nlp18new/index.htm
•http://www.cse.ucsc.edu/research/compbio/sam.html SAM Software for HMMs

Genetic Algorithms

•http://www.staff.uiuc.edu/~carroll/ga.html
•http://kal-el.ugr.es/gags.html
•http://kal-el.ugr.es/~jmerelo/GAJS.html
•http://www.genetic-programming.org/
•http://www.iitk.ac.in/kangal/deb_tut.shtml

PostDocs positions in computer science in HELSINKI, FINLAND

Fri, 06 Sep 2013 10:11:19 -0500

Several university departments in the Helsinki region, Finland, are looking for postdoctoral researchers in the field of computer science and information technology. Jobs are available at:
· Helsinki Institute for Information Technology HIIT, Aalto University and University of Helsinki, http://www.hiit.fi
· Department of Computer Science, University of Helsinki, http://www.cs.helsinki.fi
· Department of Information and Computer Science, Aalto University, http://ics.aalto.fi
· Department of Computer Science and Engineering, Aalto University, http://cse.aalto.fi
· Department of Mathematics and Statistics, University of Helsinki, http://mathstat.helsinki.fi/english/

Why Helsinki?
The collaborating Aalto University and University of Helsinki form a leading hub of computer science and modelling, including Machine learning, Data mining, Algorithms, Computational Logic, Cloud computing, Distributed computing, Human-centric ubiquitous ICT, Bioinformatics, etc.
Helsinki region is a safe, pleasant and attractive place to live in, with well-functioning services such as public transport etc. Finland has a comprehensive social security and health care system, including exceptionally good parental leaves, and children's day care services.

Positions are offered in:
Algorithm engineering (String Algorithms group)
Algorithmic bioinformatics (Genome-Scale Algorithmics group)
Automated reasoning and search, especially propositional logic (Computational Logic group)
Computational astrophysics and/or data analysis (Computational Methods and Data Analysis for Astrophysics group)
Computational biology and statistical methods in bioinformatics (Computational Systems Biology group)
Computational creativity and data mining (Discovery group)
Dynamic and large-scale networked systems (Data Communications Software group)
Intelligent multimodal information access (Content-Based Image and Information Retrieval Group)
Machine learning and neuroscience (Statistical Machine Learning group)
Machine learning for structured data (Kernel Machines, Pattern Analysis and Computational Biology group)
Machine learning methods for infectious disease epidemiology (Bayesian Statistics Group)
Probabilistic modeling and machine learning (Complex Systems Computation group)
Statistical machine learning (Statistical Machine Learning group)
Analysing ubiquitous sensor data (HIIT-Wide Focus Area)
Interactive visualization (HIIT-Wide Focus Area)
Affective computing and BCI (HIIT-Wide Focus Area)
Intelligent user interfaces and/or recommender systems (HIIT-Wide Focus Area)
Information retrieval and HCI (HIIT-Wide Focus Area)
Machine learning and data analysis, especially information retrieval, HCI, text and context data (HIIT-Wide Focus Area)
Probabilistic modeling and data analysis for bioinformatics (HIIT-Wide Focus Area)

More at http://www.hiit.fi/postdoc-call-2013

Paolo Ruggerone Lab

Tue, 01 Oct 2013 14:15:53 -0500

Efflux pumps (RND family)

Functioning of efflux systems in Gram-negative bacteria
Determinants of the compound-efflux system interactions
Action of inhibitors on efflux systems
Structural and dynamical features of the efflux systems

TatA
Assembly of the TatA system
Study of the dynamical features of the charge zipper

Methods
Setup of a kinetic Monte Carlo (KMC) scheme to study the flux of antibiotics through porins and efflux systems
Setup of protocol to integrate MD results in a ligand-based approach

Viral inhibitors
Interactions of selected compounds with RNA-dependent RNA polymerases (RdRps) of HCV and BVDV
Assessment of the role of mutations in RdRps
Antimicrobial peptides

Interactions of antimicrobial peptides with membranes: structure and dynamics
Interactions between antimicrobial peptides in the presence of different membranes
Protein-protein interactions
Effects of mutations

Lab Page
http://www.dsf.unica.it/~paolo/Site/Home.html

Uni Computing Bergen Norway

Tue, 03 Sep 2013 18:40:50 -0500

Info on Uni Computing (Webpage: http://www.bccs.uni.no/) :

Uni Computing (formerly Uni BCCS) is a department of Uni Research, affiliated with the University of Bergen.

5 groups in this lab works on computational resources, methods, algorithms, and software.

Following two bioinformatics groups are:

The Computational Biology Unit (CBU) provides education and research in bioinformatics focused on functional genomics.

The Computational Ecology Unit (CEU) is basically deal with population fluctuations, behavioural patterns and the ways life cycles emerge.

Software and Tools to detect structure variation with long reads !!

Archana Malhotra — Wed, 15 Mar 2017 14:31:09 -0500

Uncovering the connection between genetics and heritable diseases requires an approach that looks at all the variant bases and types in a genome. While a PacBio de novo assembly resolves the most novel SV variants. 8-10X PacBio coverage of single genomes or trios reveals triple the SVs detectable by short-read data.

With Single Molecule, Real-Time (SMRT) Sequencing, you can access structural variations having a broad range of sizes, types, and GC content with the ability to:

Uncover missing heritability linked to structural variation
Unambiguously identify genomic context and variant breakpoints at the sequence level to unravel the genetic etiology of disease
Resolve structural variation across the complete size spectrum with basepair resolution

Following are the SV tools, which can assist you to achieve your goal.

Sniffles: Structural variation caller using third generation sequencing

Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs using evidence from split-read alignments, high-mismatch regions, and coverage analysis. Please note the current version of Sniffles requires sorted output from BWA-MEM (use -M and -x parameter) or NGM-LR with the optional SAM attributes enabled!

More at https://github.com/fritzsedlazeck/Sniffles

MultiBreak-SV: It identifies structural variants from next-generation paired end data, third-generation long read data, or data from a combination of sequencing platforms.

There are two pieces of software in this release: (1) a pre-processor that takes machineformat (.m5) BLASR files, and (2) MultiBreak-SV. For installation and usage instructions, see doc/MultiBreakSV-Manual.txt.

More at https://github.com/raphael-group/multibreak-sv

Parliament: A Structural Variation Tool. Why ask a single sv-detection approach to find every variant when you can have a parliament of tools deciding?

Publication about the algorithm and “…the first long-read characterization of structural variation in a diploid human personal genome…” (HS1011) - “Assessing structural variation in a personal genome—towards a human reference diploid genome”

More at https://sourceforge.net/projects/parliamentsv/

https://www.dnanexus.com/papers/Parliament_Info_Sheet.pdf

PBHoney: the structural variation discovery tool

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.

Read The Paper http://www.biomedcentral.com/1471-2105/15/180/abstract

More at https://sourceforge.net/projects/pb-jelly/

SMRT-SV: Structural variant and indel caller for PacBio reads

Structural variant (SV) and indel caller for PacBio reads based on methods from Chaisson et al. 2014.

SMRT-SV provides an official software package for tools described in Chaisson et al. 2014 and adds several key features including the following.

Unified variant calling user interface with built-in cluster compute support
Small indel calling (2-49 bp)
Improved inversion calling (screenInversions)
Quality metric for SV calls based on number of local assemblies supporting each call
Higher sensitivity for SV calls using tiled local assemblies across the entire genome instead of "signature" regions
Genotyping of SVs with Illumina paired-end reads from WGS samples

More at https://github.com/EichlerLab/pacbio_variant_caller

Update Genome Workbench 2.7.15 released

Surabhi Chaudhary — Wed, 26 Feb 2014 16:12:17 -0600

NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.

Genome Workbench can display sequence data in many ways, including graphical sequence views, various alignment views, phylogenetic tree views, and tabular views of data. It can also align your private data to data in public databases, display your data in the context of public data, and retrieve BLAST results.

Genome Workbench is built on the NCBI C++ ToolKit and uses cross-platform APIs for graphics. It runs on your local machine, and is available for Windows 2000/XP, Linux, MacOS X, and various flavors of Unix.

NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. Genome Workbench was developed entirely in-house at NCBI and makes use of the NCBI C++ ToolKit. The C++ ToolKit provides a convenient and flexible cross-platform API for managing system internals, database connections, network sockets, and the NCBI data model. In addition, the C++ ToolKit provides the Object Manager, which abstracts handling of sequences and sequence-related objects.

New Features in Genome Workbench 2.7.15

Multiple Alignment View: implemented adaptive feature display when zooming in
Active Objects Inspector replaces Selection Inspector. New View should offer an improved selection context examination. See Using Active Objects Inspector tutorial for more details.
Binary packages for Linux OpenSUSE 13.1 are now available

Bug Fixes and Improvements in Genome Workbench 2.7.15

Fixed major issue with OpenGL overlay/scrolling. Could cause crashes or view scrolling irregularities
Multiple Pane View: fixed crash on loading BLAST results
Graphical Sequence View: fixed crash on zooming in and out, related to SNP track
Graphical Sequence View: fixed Go To Position dialog to give better diagnostics in case of a user error
Graphical Sequence View: PDF export fixed rendering of Markers with commas in the name
Text View / Flat File: fixed Mac OS rendering issues
Text View / Flat File: performance optimization, extended capabilities of real-time rendering of molecules to tens of thousands
File Import: optimization improvement to speed up load of files containing multiple project items
File Import: remapping stage now shows accession.version and description of molecules, instead of plain GI numbers
Mac OS: improved tooltips for toolbar buttons
Phylogenetic Tree Builder Tool: improved diagnostics of errors
Multiple Alignment View: optimizations to avoid main GUI freezes
Open Dialog: removed duplicate elements in table of genomes (load Genome)
PDF export: fixed issue with XREF table errors
Tree View: fixed issues with showing Force Layout progress on Mac OS
Tree View: PDF export fixed issues for showing labels of collapsed nodes
Tree View: added an option to stop layout
Tree View: broadcasting mechanism fixed not to accumulate selected nodes

Reference:

NCBI news

http://www.ncbi.nlm.nih.gov/tools/gbench/

Special Project Scientist – Sorghum Genomics

Tue, 20 May 2014 00:34:39 -0500

ICRISAT is seeking applications from Indian Nationals for a Special Project Scientist to work on a sorghum genomics activities related to sequencing/re-sequencing projects utilizing New Generation Sequencing platforms.

The Job detail

Advancing the SNP-discovery and polymorphism assessment work across several germplasm panels representing global genetic diversity
Population genetic and genomic analyses, testing the hypothesis related to adaptation in multiple geographic regions
Develop SNP assays from large scale GBS and other re-sequencing data for several target traits utilizing available phenotyping data
Combined analyses of genotypic and phenotypic data for discovery of marker-trait associations, and conducting GWAS
Processing, analyzing, and archiving large-scale genomic data sets, assessing data quality, conducting analyses, interpreting findings, and communicating findings to others including preparation of reports, presentations, posters and journal articles
Providing support to MSc and PhD students on topic related to its major core of research
Any other work assigned by the supervisor

The Person:

PhD in bioinformatics, genetics, computational biology preferably with 1 to 2 years of experience;
familiar with standard bioinformatics tools and scripting languages and emerging and evolving software platforms relevant to bioinformatics and computational biology;
ability to create new analytical pipelines; experience with handling large data sets;
ability to program in at least two of the following: C++, PERL, Python, R, Java.
will use next-generation sequencing technologies to generate marker data for genetic mapping and transcriptome data for expression QTL mapping, and will be responsible for data generation as well as data analysis.

Period and Remuneration: The assignment is for a period of two years, and can be extended for another year depending on performance. ICRISAT pays a very attractive all inclusive lump sum assignment fee payable in Indian Rupees.

How to Apply: Please send your application by email to icrisatjobs@cgiar.org, stating the job title (Special project Scientist-Sorghum Genomics) clearly in the subject column, addressed to the Director, Human Resources and Operations, ICRISAT, Patancheru, Andhra Pradesh 502 324, India, latest by 10 June 2014. The application should include an up-to-date Curriculum Vitae, a short statement of competencies and experience for the position, and the names and addresses (including phone/e-mail) of three referees. Only short-listed candidates will be contacted.

More at: http://www.icrisat.org/careers/Special-Project-Scientist-Sorghum-Genomics.htm