BOL: Related items

List of generic simulation software/tools/resource with brief description and homepage !!!

Jit — Mon, 10 Feb 2014 05:57:29 -0600

List of generic simulation software/tools/resource with brief description and homepage

ALF
A Simulation Framework for Genome Evolution
http://www.cbrg.ethz.ch/alf

Bayesian Serial SimCoal
Bayesian Serial SimCoal, (BayeSSC) is a modification of SIMCOAL 1.0, a program written by Laurent Excoffier, John Novembre, and Stefan Schneider.
http://www.stanford.edu/group/hadlylab/ssc/index.html

BEERS
BEERS was designed to benchmark RNA-Seq alignment algorithms and also algorithms that aim to reconstruct different isoforms and alternate splicing from RNA-Seq data
http://cbil.upenn.edu/beers/

BOTTLENECK
Bottleneck is a program for detecting recent effective population size reductions from allele data frequencies
http://www.ensam.inra.fr/urlb/bottleneck/bottleneck.html

BottleSim
BottleSim is a computer simulation program for simulating the process of population bottlenecks
http://chkuo.name/software/bottlesim.html

CASS
Protein Sequence Simulation
http://www.wyomingbioinformatics.org/liberlesgroup/cass/

CDPOP
CDPOP is a landscape genetics tool for simulating the emergence of spatial genetic structure in populations resulting from specified landscape processes governing organism movement behavior.
http://cel.dbs.umt.edu/cdpop

CoalFace
CoalFace is a simulation of the coalescent process with the visual display of gene genealogies.
http://web.up.ac.za/default.asp?ipkcategoryid=3283

CoaSim
CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models.
http://users-birc.au.dk/mailund/coasim/index.html

cosi
The cosi package is written in C and is available as a tar file.
http://www.broadinstitute.org/~sfs/cosi/

CS-PSeq-Gen
A program to simulate the evolution of protein sequences under the constraints of the information of a particular reconstructed phylogeny
http://bioserv.rpbs.univ-paris-diderot.fr/software/cs-pseq-gen.html

DAWG
An application designed to simulate the evolution of recombinant DNA sequences in continuous time
http://scit.us/projects/dawg

Easypop
EASYPOP is an individual based model intended to simulate datasets under a very broad range of conditions
http://www.unil.ch/dee/page36926_fr.html

EggLib
EggLib is a C++/Python library and program package for evolutionary genetics and genomics.
http://egglib.sourceforge.net/

EvolSimulator
A simulation test bed for hypotheses of genome evolution
http://acb.qfab.org/acb/evolsim/

EvolveAGene
A realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions
http://bellinghamresearchinstitute.com/software/index.html

fastsimcoal
A continuous-¬‐time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios
http://cmpg.unibe.ch/software/fastsimcoal/

FastSLINK
Simulation of Marker and Phenotype Data in Pedigrees
http://watson.hgen.pitt.edu/

FFPopSim
C++/Python library for population genetics.
http://webdav.tuebingen.mpg.de/ffpopsim/

FLUX SIMULATOR
The Flux Simulator aims at providing a deterministic in silico reproduction of the experimental pipelines for RNA-Seq, employing a minimal set of parameters.
http://flux.sammeth.net/simulator.html

ForSim
ForSim: A Forward Evolutionary Computer Simulation
http://www.anthro.psu.edu/weiss_lab/research.shtml

ForwSim
The program given below is based on the algorithm described in Padhukasahasram et al. 2008 to simulate genetic drift in a standard Wright-Fisher process.
http://badri-populationgeneticsimulators.blogspot.com/

FPG
Forward Population Genetic simulation
http://genfaculty.rutgers.edu/hey/software#fpg

FREGENE
FREGENE is a C++ program that simulates sequence-like data over large genomic regions in large diploid populations.
http://www.ebi.ac.uk/projects/bargen/download/fregen/documentation_html.html

GAMETES
Genetic Architecture Model Emulator for Testing and Evaluating Software: Simulates complex SNP models with pure, strict epistatic interactions with n-loci.
http://sourceforge.net/projects/gametes/?source=navbar

GASP
Genometric Analysis Simulation Program. A software tool for testing and investigating methods in statistical genetics by generating samples of family data based on user specified models.
http://research.nhgri.nih.gov/gasp/

GemSIM
Next generation sequencing read simulator
http://sourceforge.net/projects/gemsim/

GeneArtisan
Simulation of Markers in Case-Control Study Designs
http://www.rannala.org/?page_id=241

GENOME
A rapid coalescent-based whole genome simulator
http://www.sph.umich.edu/csg/liang/genome/

GenomePop2
GenomePop2 is a specialization of the program GenomePop just to manage SNPs under more flexible and useful settings. If you need models with more than 2 alleles please use the GenomePop program version.
http://webs.uvigo.es/acraaj/genomepop2.htm

GenomeSimla
GenomeSIMLA is currently under development- however, we have a beta release that we are asking to be tested
http://chgr.mc.vanderbilt.edu/genomesimla/

GENS2
Simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions.
https://sourceforge.net/projects/gensim/

GWAsimulator
A rapid whole genome simulation program
http://biostat.mc.vanderbilt.edu/wiki/main/gwasimulator

HAP-SAMPLE
An association simulator for candidate regions or genome scans
http://www.hapsample.org/

HAPGEN
A simulator for the simulation of case control datasets at SNP markers
https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html

HapSim
A simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients
http://cran.r-project.org/web/packages/hapsim/index.html

HAPSIMU
A program that simulates heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model
http://l.web.umkc.edu/liujian/

IBDsim
IBDSim is a computer package for the simulation of genotypic data under general isolation by distance models.
http://raphael.leblois.free.fr/

indel-Seq-Gen
A biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies
http://bioinfolab.unl.edu/~cstrope/isg/

Indelible
A powerful and flexible simulator of biological evolution
http://abacus.gene.ucl.ac.uk/software/indelible/

invertFREGENE
InvertFREGENE is a forward-in-time simulator of inversions in population genetic data
http://www.ebi.ac.uk/projects/bargen/

kernalPop
A spatially explicit population genetic simulation engine
http://cran.r-project.org/src/contrib/archive/kernelpop/

MaCS
Markovian Coalescent Simulator
http://www-hsc.usc.edu/~garykche/

Mason
A package for the simulation of nucleotide data.
http://www.seqan.de/projects/mason/

mbs
modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection
http://www.sendou.soken.ac.jp/esb/innan/innanlab/software.html

Mendel's Accountant
Mendel's Accountant (MENDEL) is an advanced numerical simulation program for modeling genetic change over time and was developed collaboratively by Sanford, Baumgardner, Brewer, Gibson and ReMine
http://mendelsaccount.sourceforge.net/

MetaSim
A tool to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets
http://ab.inf.uni-tuebingen.de/software/metasim/

mlcoalsim
Multilocus Coalescent Simulations
http://code.google.com/p/mlcoalsim-v1/

ms
The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets.
http://home.uchicago.edu/~rhudson1/source/mksamples.html

msHOT
The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets.
http://home.uchicago.edu/~rhudson1/

msms
A coalescent Simlation tool with selection.
http://www.mabs.at/ewing/msms/index.shtml

MySSP
A program for the simulation of DNA sequence evolution across a phylogenetic tree
http://www.rosenberglab.net/software.php

Nemo
A forward-time, individual-based, genetically explicit, and stochastic simulation program designed to study the evolution of genetic markers, life history traits, and phenotypic traits in a flexible (meta-)population framework.
http://nemo2.sourceforge.net/

NetRecodon
Coalescent simulation of coding DNA sequences with recombination (inter and intracodon), migration and demography
http://code.google.com/p/netrecodon/

PEDAGOG
Software for simulating eco-evolutionary population dynamics
https://bcrc.bio.umass.edu/pedigreesoftware/node/5

phenosim
A tool to add phenotypes to simulated genotypes
http://evoplant.uni-hohenheim.de/doku.php?id=software:software

PhyloSim
An R package for the Monte Carlo simulation of sequence evolution
http://bit.ly/rlsim-git

pIRS
Profile-based Illumina pair-end reads simulator
https://code.google.com/p/pirs/

ProteinEvolver
Simulation of protein evolution along phylogenies under structure-based substitution models
http://code.google.com/p/proteinevolver/

QMSim
QTL and Marker Simulator
http://www.aps.uoguelph.ca/~msargol/qmsim/

quantiNEMO
An individual-based program for the analysis of quantitative traits with explicit genetic architecture potentially under selection in a structured population
http://www2.unil.ch/popgen/softwares/quantinemo/

RECOAL
Simulates new haplotype data from a reference population of haplotypes.
ftp://popgen.usc.edu/

Recodon
Coalescent simulation of coding DNA sequences with recombination, migration and demography
http://code.google.com/p/recodon/

rlsim
A package for simulating RNA-seq library preparation with parameter estimation
http://bit.ly/rlsim-git

Rmetasim
Rmetasim is a front-end for the metasim engine that is implemented as a package that runs in the statistical computing environment R
http://linum.cofc.edu/software.html#metasim

RNA Seq Simulator
RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.
http://useq.sourceforge.net/cmdlnmenus.html#rnaseqsimulator

Rose
Random model of sequence evolution
http://bibiserv.techfak.uni-bielefeld.de/rose/

SelSim
SelSim is a program for Monte Carlo simulation of DNA polymorphism data for a recom- bining region within which a single bi-allelic site has experienced natural selection
http://www.well.ox.ac.uk/~spencer/selsim/

Seq-Gen
An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees.
http://tree.bio.ed.ac.uk/software/seqgen/

SEQPower
Statistical power analysis for sequence-based association studies
http://bioinformatics.org/spower/

SeqSIMLA
SeqSIMLA can simulate sequence data with user-specified disease and quantitative trait models. Family or unrelated case-control data can be simulated.
http://seqsimla.sourceforge.net/

Serial NetEvolve
A flexible utility for generating serially-sampled sequences along a tree or recombinant network
http://biorg.cis.fiu.edu/sne/

SFS_CODE
SFS_CODE can perform forward population genetic simulations under a general Wright-Fisher model with arbitrary migration, demographic, selective, and mutational effects.
http://sfscode.sourceforge.net/sfs_code/index/index.html

SIBSIM
Quantitative phenotype simulation in extended pedigrees
http://sourceforge.net/projects/sibsim/

SIMCOAL2
A coalescent program for the simulation of complex recombination patterns over large genomic regions under various demographic models
http://cmpg.unibe.ch/software/simcoal2/

SimCopy
An R package simulating the evolution of copy number profiles along a tree.
http://bit.ly/simcopy

SIMLA
SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies.
http://www.chg.duke.edu/research/simla.html

SimPed
A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures
http://www.hgsc.bcm.tmc.edu/content/simped

Simprot
A program to simulate protein evolution by substitution, insertion and deletion
http://www.uhnresearch.ca/labs/tillier/software.htm#3

SimRare
Rare variant simulation and analysis tool
http://code.google.com/p/simrare/

simuGWAS
A forward-time simulator that simulates realistic samples for genome-wide association studies.
http://simupop.sourceforge.net/cookbook/simucomplexdisease

simuPOP
simuPOP is a general-purpose individual-based forward-time population genetics simulation environment.
http://simupop.sourceforge.net/

SISSI
A software tool to generate data of related sequences along a given phylogeny, taking into account user defined system of neighbourhoods and instantaneous rate matrices.
http://www.cibiv.at/software/sissi/

SNPsim
Coalescent simulation of hotspot recombination
http://code.google.com/p/phylosoftware/

SPIP
SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user
http://swfsc.noaa.gov/textblock.aspx?division=fed&id=3434

Splatche
Spatial and Temporal Coalescences in Heterogeneous Environment
http://www.splatche.com/

srv
Simulator of Rare Varaints (srv) is a simulator for the simulation of the introduction and evolution of (rare) genetic variants.
http://simupop.sourceforge.net/cookbook/simurarevariants

SUP
SLINK/FastSLINK utility program
http://mlemire.freeshell.org/software.html

TreesimJ
A flexible, forward-time population genetic simulator
http://code.google.com/p/treesimj/

Vortex
VORTEX is an individual-based simulation model for population viability analysis (PVA).
http://www.vortex9.org/vortex.html

References:

Image www.evolution-of-life.com

www.cancer.gov

CSBB-v1.0

Neel — Wed, 29 Jun 2016 07:33:05 -0500

CSBB is a command line based bioinformatics suite to analyze biological data acquired through varied avenues of biological experiments. CSBB is implemented in Perl, while it also leverages the use of R and python in background for specific modules. Major focus of CSBB is to allow users from biology and bioinformatics community, to get benefited by performing down-stream analysis tasks while eliminating the need to write programming code. CSBB is currently available on Linux, UNIX, MAC OS and Windows platforms.

Currently CSBB provides 13 modules focused on analytical tasks like performing upper-quantile normalization on expression data or convert genome wide gene expression to z-scores when comparing expression data from different platforms.

More at https://github.com/skygenomics/CSBB-v1.0

Address of the bookmark: https://github.com/skygenomics/CSBB-v1.0

Software and Tools to detect structure variation with long reads !!

Archana Malhotra — Wed, 15 Mar 2017 14:31:09 -0500

Uncovering the connection between genetics and heritable diseases requires an approach that looks at all the variant bases and types in a genome. While a PacBio de novo assembly resolves the most novel SV variants. 8-10X PacBio coverage of single genomes or trios reveals triple the SVs detectable by short-read data.

With Single Molecule, Real-Time (SMRT) Sequencing, you can access structural variations having a broad range of sizes, types, and GC content with the ability to:

Uncover missing heritability linked to structural variation
Unambiguously identify genomic context and variant breakpoints at the sequence level to unravel the genetic etiology of disease
Resolve structural variation across the complete size spectrum with basepair resolution

Following are the SV tools, which can assist you to achieve your goal.

Sniffles: Structural variation caller using third generation sequencing

Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs using evidence from split-read alignments, high-mismatch regions, and coverage analysis. Please note the current version of Sniffles requires sorted output from BWA-MEM (use -M and -x parameter) or NGM-LR with the optional SAM attributes enabled!

More at https://github.com/fritzsedlazeck/Sniffles

MultiBreak-SV: It identifies structural variants from next-generation paired end data, third-generation long read data, or data from a combination of sequencing platforms.

There are two pieces of software in this release: (1) a pre-processor that takes machineformat (.m5) BLASR files, and (2) MultiBreak-SV. For installation and usage instructions, see doc/MultiBreakSV-Manual.txt.

More at https://github.com/raphael-group/multibreak-sv

Parliament: A Structural Variation Tool. Why ask a single sv-detection approach to find every variant when you can have a parliament of tools deciding?

Publication about the algorithm and “…the first long-read characterization of structural variation in a diploid human personal genome…” (HS1011) - “Assessing structural variation in a personal genome—towards a human reference diploid genome”

More at https://sourceforge.net/projects/parliamentsv/

https://www.dnanexus.com/papers/Parliament_Info_Sheet.pdf

PBHoney: the structural variation discovery tool

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.

Read The Paper http://www.biomedcentral.com/1471-2105/15/180/abstract

More at https://sourceforge.net/projects/pb-jelly/

SMRT-SV: Structural variant and indel caller for PacBio reads

Structural variant (SV) and indel caller for PacBio reads based on methods from Chaisson et al. 2014.

SMRT-SV provides an official software package for tools described in Chaisson et al. 2014 and adds several key features including the following.

Unified variant calling user interface with built-in cluster compute support
Small indel calling (2-49 bp)
Improved inversion calling (screenInversions)
Quality metric for SV calls based on number of local assemblies supporting each call
Higher sensitivity for SV calls using tiled local assemblies across the entire genome instead of "signature" regions
Genotyping of SVs with Illumina paired-end reads from WGS samples

More at https://github.com/EichlerLab/pacbio_variant_caller

A guide for complete R beginners :- Installing R packages

Archana Malhotra — Tue, 24 Feb 2015 20:23:34 -0600

Part of the reason R has become so popular is the vast array of packages available at the cran and bioconductor repositories. In the last few years, the number of packages has grown exponentially!

This is a short post giving steps on how to actually install R packages. Let’s suppose you want to install the ggplot2 package. Well nothing could be easier. We just fire up an R shell and type:
> install.packages("ggplot2")

In theory the package should just install, however:

if you are using Linux and don’t have root access, this command won’t work.
you will be asked to select your local mirror, i.e. which server should you use to download the package.

Installing packages without root access

First, you need to designate a directory where you will store the downloaded packages. On my machine, I use the directory /data/Rpackages/ After creating a package directory, to install a package we use the command:
> install.packages("ggplot2", lib="/data/Rpackages/") > library(ggplot2, lib.loc="/data/Rpackages/")

It’s a bit of a pain having to type /data/Rpackages/ all the time. To avoid this burden, we create a file .Renviron in our home area, and add the line R_LIBS=/data/Rpackages/ to it. This means that whenever you start R, the directory /data/Rpackages/ is added to the list of places to look for R packages and so:

> install.packages("ggplot2") > library(ggplot2)

just works!

Setting the repository

Every time you install a R package, you are asked which repository R should use. To set the repository and avoid having to specify this at every package install, simply:

create a file .Rprofile in your home area.
Add the following piece of code to it:

cat(".Rprofile: Setting UK repositoryn") r = getOption("repos") # hard code the UK repo for CRAN r["CRAN"] = "http://cran.uk.r-project.org" options(repos = r) rm(r)

I found this tip in a stackoverflow answer .

CANU: Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing.

Jit — Tue, 26 Apr 2016 11:38:10 -0500

Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered.

Canu is a hierachical assembly pipeline which runs in four steps:

Detect overlaps in high-noise sequences using MHAP
Generate corrected sequence consensus
Trim corrected sequences
Assemble trimmed corrected sequences

Read the documentation

New release https://github.com/marbl/canu/releases

Address of the bookmark: https://github.com/marbl/canu

cutadapt

Radha Agarkar — Fri, 13 May 2016 04:54:50 -0500

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

Cleaning your data in this way is often required: Reads from small-RNA sequencing contain the 3’ sequencing adapter because the read is longer than the molecule that is sequenced. Amplicon reads start with a primer sequence. Poly-A tails are useful for pulling out RNA from your sample, but often you don’t want them to be in your reads.

Cutadapt helps with these trimming tasks by finding the adapter or primer sequences in an error-tolerant way. It can also modify and filter reads in various ways. Adapter sequences can contain IUPAC wildcard characters. Also, paired-end reads and even colorspace data is supported. If you want, you can also just demultiplex your input data, without removing adapter sequences at all.

Cutadapt comes with an extensive suite of automated tests and is available under the terms of the MIT license.

If you use cutadapt, please cite DOI:10.14806/ej.17.1.200 .

Address of the bookmark: https://cutadapt.readthedocs.io/en/stable/installation.html#quickstart

Bpipe - a tool for running and managing bioinformatics pipelines

Radha Agarkar — Sat, 21 May 2016 22:42:16 -0500

Bpipe provides a platform for running big bioinformatics jobs that consist of a series of processing stages - known as 'pipelines'.

January 20th, 2016 - New! Bpipe 0.9.9 released!
Download latest, all
Documentation
Mailing List (Google Group)

Bpipe has been published in Bioinformatics! If you use Bpipe, please cite:

Sadedin S, Pope B & Oshlack A, Bpipe: A Tool for Running and Managing Bioinformatics Pipelines, Bioinformatics

Address of the bookmark: http://docs.bpipe.org/

DarkHorse

Jit — Wed, 22 Jun 2016 05:37:38 -0500

DarkHorse is a bioinformatic method for rapid, automated identification and ranking of phylogenetically atypical proteins on a genome-wide basis. It works by selecting potential ortholog matches from a reference database of amino acid sequences, then using these matches to calculate a lineage probability index (LPI) score for each genome protein.

LPI scores are inversely proportional to the phylogenetic distance between database match sequences and the query genome. These scores are useful not only for large-scalede novo predictions of horizontally transferred proteins, but can also serve as an independent quality control test for potential horizontal transfer candidates identified by alternative methods, especially those based on nucleic acid signatures. Candidates having high LPI scores are unlikely to have been horizontally transferred, since they are highly conserved among closely related organisms.

One unique and powerful feature of the DarkHorse HGT Candidate database is the opportunity to explore the phylogenetic background of potential HGT donors as well as recipients. The breadth of the database allows not only query sequences, but also their database match partners to be evaluated for sequence similarity or novelty compared to taxonomically related organisms.

DarkHorse is configurable for varying degrees of phylogenetic granularity and protein sequence conservation. Users should consult the references cited below for a complete explanation of parameter selection and result interpretation. A brief tutorial page is also available on-line.

Address of the bookmark: http://darkhorse.ucsd.edu/download.html

WgSim

Jit — Thu, 23 Jun 2016 07:26:49 -0500

Reads simulator

Wgsim is a small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing errors. It does not generate INDEL sequencing errors, but this can be partly compensated by simulating INDEL polymorphisms.

Wgsim outputs the simulated polymorphisms, and writes the true read coordinates as well as the number of polymorphisms and sequencing errors in read names. One can evaluate the accuracy of a mapper or a SNP caller with wgsim_eval.pl that comes with the package.

Address of the bookmark: https://github.com/lh3/wgsim

Bioinformatics tools and software

Jit — Tue, 05 Jul 2016 10:02:26 -0500

USEARCH >
Extreme high-throughput sequence analysis. Orders of magnitude faster than BLAST. MUSCLE >
Multiple sequence alignment. Faster and more accurate than CLUSTALW.

UPARSE >
OTU clustering for 16S and other marker genes. Highly accurate OTU sequences and improved diversity measures. UCHIME >
Chimeric sequence detection. PILER >
De novo genome repeat finder. PILER-CR >
Detection of CRISPR repeats in bacterial genomes. QSCORE >
Compare two multiple alignments for benchmarking. PALS >
Whole-genome alignment. PREFAB >
Protein Reference Alignment Database. MSA benchmark collection >
Selected multiple alignment benchmarks in a standardized FASTA format.

Address of the bookmark: http://drive5.com/software.html