BOL: Related items

Postdoc position in protein annotation and machine learning - Paris, France

Sat, 04 Oct 2014 08:10:45 -0500

We are interested in finding an excellent postdoc with interests in protein functional annotation, machine learning and computer grids. The position is open for 3.5 years at the Université Pierre et Marie Curie, in the heart of Paris.

Research topic: Protein function annotation, multiple probabilistic models, domain architecture, machine learning, combinatorial optimization, computer grid.

This project is run on the Laboratoire de Biologie Computationnelle et Quantitative UMR7238 CNRS-UPMC – Analytical Genomics team, headed by A.Carbone. It is co-advised with Pierre-Henri Wuillemin, Laboratoire d’Informatique de Paris 6 – Equipe DECISION.

The postdoc will be payed under a contract of Ingénieur de Recherche lasting 3.5 years and it is available from September 1st, 2014.

Group Web Page: http://www.lcqb.upmc.fr/AnalGenom/home.html

Ref. E-Mail: Alessandra Carbone alessandra.carbone@lip6.fr

Ryan E. Mills Lab

Tue, 26 May 2015 09:29:24 -0500

Our research group is primarily focused on the analysis of whole genome sequence data to identify genetic variation (primarily structural variation) and examine their potential functional impact in disease phenotypes. We are particularly interested in analyzing complex regions of the genome that are not easily resolved through modern sequencing approaches and which may exhibit interesting mechanistic origins.

We are also interested in the large-scale integration of genomic, expression, methylation and proteomic data sets, as well as the application of whole genome sequence analysis in clinical diagnostics.

More at http://millslab.ccmb.med.umich.edu/index.html

Raphael Lab

Sat, 04 Jul 2015 19:05:29 -0500

Raphael Lab research is focused on Bioinformatics and Computational Biology.

Current research interests include next-generation DNA sequencing, structural variation, genome rearrangements in cancer and evolution, and network analysis of somatic mutations in cancer. Earlier research included topics in comparative genomics, multiple sequence alignment, and motif finding.

More athttp://compbio.cs.brown.edu/

BioRG

Tue, 04 Aug 2015 20:52:52 -0500

This research group works on problems from the fields of Bioinformatics, Biotechnology, Data Mining, and Information Retrieval. The group's research projects includes Comparative Genomics of Bacterial genomes, Metagenomics, Genomic databases, Pattern Discovery in sequences and structures, micro-array data analysis, prediction of regulatory elements, primer design, probe design, phylogenetic analysis, medical image processing, image analysis, data integration, data mining, information retrieval, knowledge discovery in electronic medical records, and more.

More at http://biorg.cis.fiu.edu/

Ensembl comparative genomics resources

Jitendra Narayan — Sun, 28 Feb 2016 17:10:20 -0600

The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available.

Database URL: http://www.ensembl.org.

Address of the bookmark: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4761110/

HiCdat

Jit — Fri, 12 Feb 2016 05:23:44 -0600

HiCdat: a fast and easy-to-use Hi-C data analysis tool

HiCdat is easy-to-use and provides solutions starting from aligned reads up to in-depth analyses. Importantly, HiCdat is focussed on the analysis of larger structural features of chromosomes, their correlation to genomic and epigenomic features, and on comparative studies. It uses simple input and output formats and can therefore easily be integrated into existing workflows or combined with alternative tools.

More at http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0678-x

Address of the bookmark: https://github.com/MWSchmid/HiCdat

Machine Learning !!!

Gudiya Pal — Fri, 01 Jul 2016 12:57:12 -0500

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions.

Keep scrolling. Using a data set about homes, we will create a machine learning model to distinguish homes in New York from homes in San Francisco.

Address of the bookmark: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

Stacks

Jitendra Narayan — Wed, 24 Feb 2016 15:52:30 -0600

Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.

More at http://catchenlab.life.illinois.edu/stacks/

Address of the bookmark: http://catchenlab.life.illinois.edu/stacks/

Computer simulation of genetic mechanism !!

Jit — Sun, 13 Mar 2016 09:29:56 -0500

Computer simulation is the discipline of designing a model of an actual or theoretical physical/biological system, executing the model on a digital computer, and analyzing the execution output. Simulation embodies the principle of ``learning by doing'' --- to learn about the system we must first build a model of some sort and then operate the model. The use of simulation is an activity that is as natural as a child who role plays. Children understand the world around them by simulating (with toys and figurines) most of their interactions with other people, animals and objects. As adults, we lose some of this childlike behavior but recapture it later on through computer simulation. To understand reality and all of its complexity, we must build artificial objects and dynamically act out roles with them. Computer simulation is the electronic equivalent of this type of role playing and it serves to drive synthetic environments and virtual worlds. Within the overall task of simulation, there are three primary sub-fields: model design, model execution and model analysis

Simulation models have become important tools in Bioinformatics studies. There are many reasons for this, but we emphasize three of the more important:

(1) they enable exploration of hypotheses, and as such, have become invaluable means to guide research;

(2) they are unique approaches to integrate (in the literal term of the word) biological knowledge, in the form of experimental results; and

(3) they enable connecting biology with other fields of study ranging from physiology to genomics;

This blog, and this software list, is intended to guide the potential user of simulation models.
It is not, in any way, meant to be comprehensive on the very diverse simulation tools that already exist, but focuses on mechanistic, dynamic models. Similarly, it is not meant to provide any coverage of the breadth of applications; however, for interested readers, we provide references to use as a possible starting point.

Simulation models are meant to answer questions which scientists have in a dynamic, quantitative, and often, a pictorial way. Much of the bioinformatics research and its applications, in particular, involve a large number of components, actors, and factors. Assembling these in a coherent framework may seem a daunting task, especially for beginners, and can lead to confusion, even for experienced scientists, especially if the objectives of such an exercise are not well defined. Followings are the list of tools bioinformatician may use to analyze and provide answers to complex biological mechanisms and related problems.

Software Resource	Brief Description and Homepage
Aladyn	Tools to investigate how demographic parameters, populations genetics and abiotic conditions affect the rate of adaptation http://www.katja-schiffers.eu/research.html
ALF	A Simulation Framework for Genome Evolution http://www.cbrg.ethz.ch/alf
ART	ART is a set of simulation tools to generate synthetic next-generation sequencing data by mimicking real sequencing process with empirical error models or quality profiles. http://www.niehs.nih.gov/research/resources/software/biostatistics/art/
BAMSurgeon	Methods for realistic simulation of mutations in real data. https://github.com/adamewing/bamsurgeon
Bayesian Serial SimCoal	Bayesian Serial SimCoal, (BayeSSC) is a modification of SIMCOAL 1.0, a program written by Laurent Excoffier, John Novembre, and Stefan Schneider. http://www.stanford.edu/group/hadlylab/ssc/index.html
BaySICS	An integral platform with a graphical interface for statistical inference based on approximate Bayesian computation. https://sites.google.com/site/baysicsabc/
BEERS	BEERS was designed to benchmark RNA-Seq alignment algorithms and also algorithms that aim to reconstruct different isoforms and alternate splicing from RNA-Seq data http://cbil.upenn.edu/beers/
BOTTLENECK	Bottleneck is a program for detecting recent effective population size reductions from allele data frequencies http://www.ensam.inra.fr/urlb/bottleneck/bottleneck.html
BottleSim	BottleSim is a computer simulation program for simulating the process of population bottlenecks http://chkuo.name/software/bottlesim.html
CASS	Protein Sequence Simulation https://liberles.cst.temple.edu/software/cass/index.html
CDPOP	CDPOP is a landscape genetics tool for simulating the emergence of spatial genetic structure in populations resulting from specified landscape processes governing organism movement behavior. http://cel.dbs.umt.edu/cdpop
Classical Genetics Simulator	Web-based simulation software http://www.cgslab.com/
CoaSim	CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models. http://users-birc.au.dk/mailund/coasim/index.html
cosi	The cosi package is written in C and is available as a tar file. http://www.broadinstitute.org/~sfs/cosi/
CS-PSeq-Gen	A program to simulate the evolution of protein sequences under the constraints of the information of a particular reconstructed phylogeny http://bioserv.rpbs.univ-paris-diderot.fr/software/cs-pseq-gen/
DAWG	An application designed to simulate the evolution of recombinant DNA sequences in continuous time http://scit.us/projects/dawg
Easypop	EASYPOP is an individual based model intended to simulate datasets under a very broad range of conditions http://www.unil.ch/dee/en/home/menuinst/softwares--dataset/softwares/easypop.html
EggLib	EggLib is a C++/Python library and program package for evolutionary genetics and genomics. http://egglib.sourceforge.net/
EpiSIM	EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis https://sourceforge.net/projects/episimsimulator/files/
EvolSimulator	A simulation test bed for hypotheses of genome evolution http://acb.qfab.org/acb/evolsim/
EvolveAGene	A realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions http://bellinghamresearchinstitute.com/software/index.html
fastsimcoal	A continuous-‐time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios http://cmpg.unibe.ch/software/fastsimcoal/
FastSLINK	Simulation of Marker and Phenotype Data in Pedigrees https://watson.hgen.pitt.edu/
FFPopSim	C++/Python library for population genetics. http://webdav.tuebingen.mpg.de/ffpopsim/
FLUX SIMULATOR	The Flux Simulator aims at providing a deterministic in silico reproduction of the experimental pipelines for RNA-Seq, employing a minimal set of parameters. http://sammeth.net/confluence/display/sim/home
forqs	Forward-in-time simulation of Recombination, Quantitative Traits, and Selection https://bitbucket.org/dkessner/forqs
ForSim	ForSim: A Forward Evolutionary Computer Simulation http://anth.la.psu.edu/research/weiss-lab/research/research
ForwSim	The program given below is based on the algorithm described in Padhukasahasram et al. 2008 to simulate genetic drift in a standard Wright-Fisher process. http://badri-populationgeneticsimulators.blogspot.com/
FPG	Forward Population Genetic simulation https://bio.cst.temple.edu/~hey/software/software.htm#fpg
FREGENE	FREGENE is a C++ program that simulates sequence-like data over large genomic regions in large diploid populations. http://www.ebi.ac.uk/projects/bargen
FIGG	FIGG is a genome simulation tool that uses known or theorized variation frequency, per a given fragment size and grouped by GC content across a genome to model new genomes in FASTA format while tracking applied mutations for use in analysis http://insilicogenome.sourceforge.net/
fwdpp	A C++ template library for implementing efficient forward simulations. http://molpopgen.github.io/fwdpp/
GAMETES	Genetic Architecture Model Emulator for Testing and Evaluating Software: Simulates complex SNP models with pure, strict epistatic interactions with n-loci. http://sourceforge.net/projects/gametes/?source=navbar
GASP	Genometric Analysis Simulation Program. A software tool for testing and investigating methods in statistical genetics by generating samples of family data based on user specified models. http://research.nhgri.nih.gov/gasp/
GCTA	Genome-wide Complex Trait Analysis http://www.complextraitgenomics.com/software/gcta/download.html
GemSIM	Next generation sequencing read simulator http://sourceforge.net/projects/gemsim/
GeneArtisan	Simulation of Markers in Case-Control Study Designs http://www.rannala.org/?page_id=241
GENOME	A rapid coalescent-based whole genome simulator http://www.sph.umich.edu/csg/liang/genome/
GenomePop2	GenomePop2 is a specialization of the program GenomePop just to manage SNPs under more flexible and useful settings. If you need models with more than 2 alleles please use the GenomePop program version. https://ritchielab.psu.edu/research/research-areas/statistical-genetics-and-gen-epi/methods/genomesimla
GenomeSimla	GenomeSIMLA is currently under development- however, we have a beta release that we are asking to be tested http://chgr.mc.vanderbilt.edu/genomesimla/
GENS2	Simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. https://sourceforge.net/projects/gensim/
GWAsimulator	A rapid whole genome simulation program http://biostat.mc.vanderbilt.edu/wiki/main/gwasimulator
HAP-SAMPLE	An association simulator for candidate regions or genome scans http://www.hapsample.org/
HAPGEN	A simulator for the simulation of case control datasets at SNP markers https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html
HapSim	A simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients http://cran.r-project.org/web/packages/hapsim/index.html
HAPSIMU	A program that simulates heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model http://l.web.umkc.edu/liujian/
IBDsim	IBDSim is a computer package for the simulation of genotypic data under general isolation by distance models. http://raphael.leblois.free.fr/
indel-Seq-Gen	A biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies http://bioinfolab.unl.edu/~cstrope/isg/
Indelible	A powerful and flexible simulator of biological evolution http://abacus.gene.ucl.ac.uk/software/indelible/
invertFREGENE	InvertFREGENE is a forward-in-time simulator of inversions in population genetic data http://www.ebi.ac.uk/projects/bargen/
kernalPop	A spatially explicit population genetic simulation engine http://cran.r-project.org/src/contrib/archive/kernelpop/
MaCS	Markovian Coalescent Simulator http://www-hsc.usc.edu/~garykche/
Marlin	Marlin provides a user-friendly interface for performing forward-in-time population genetic simulations. http://www.patrickmeirmans.com/software/marlin.html
Mason	A package for the simulation of nucleotide data. http://www.seqan.de/projects/mason/
mbs	modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection http://www.sendou.soken.ac.jp/esb/innan/innanlab/software.html
Mendel's Accountant	Mendel's Accountant (MENDEL) is an advanced numerical simulation program for modeling genetic change over time and was developed collaboratively by Sanford, Baumgardner, Brewer, Gibson and ReMine http://mendelsaccount.sourceforge.net/
MetaPopGen	Simulates genetics in large size metapopulations https://sites.google.com/site/marcoandrello/metapopgen
MetaSim	A tool to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets http://ab.inf.uni-tuebingen.de/software/metasim/
mlcoalsim	Multilocus Coalescent Simulations http://code.google.com/p/mlcoalsim-v1/
ms	The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets. http://home.uchicago.edu/~rhudson1/source/mksamples.html
msHOT	The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets. http://home.uchicago.edu/~rhudson1/
msms	A coalescent Simlation tool with selection. http://www.mabs.at/ewing/msms/index.shtml
MySSP	A program for the simulation of DNA sequence evolution across a phylogenetic tree http://www.rosenberglab.net/software.html
Nemo	A forward-time, individual-based, genetically explicit, and stochastic simulation program designed to study the evolution of genetic markers, life history traits, and phenotypic traits in a flexible (meta-)population framework. http://nemo2.sourceforge.net/
NetRecodon	Coalescent simulation of coding DNA sequences with recombination (inter and intracodon), migration and demography http://code.google.com/p/netrecodon/
OncoSimulR	BioConductor package for Forward Genetic Simulation of Cancer Progresion with Epistasis https://github.com/rdiaz02/oncosimul
PEDAGOG	Software for simulating eco-evolutionary population dynamics https://bcrc.bio.umass.edu/pedigreesoftware/node/5
phenosim	A tool to add phenotypes to simulated genotypes http://evoplant.uni-hohenheim.de/doku.php?id=software:software
PhyloSim	An R package for the Monte Carlo simulation of sequence evolution http://www.ebi.ac.uk/goldman-srv/phylosim/
pIRS	Profile-based Illumina pair-end reads simulator https://code.google.com/p/pirs/
ProteinEvolver	Simulation of protein evolution along phylogenies under structure-based substitution models http://code.google.com/p/proteinevolver/
QMSim	QTL and Marker Simulator http://www.aps.uoguelph.ca/~msargol/qmsim/
quantiNEMO	An individual-based program for the analysis of quantitative traits with explicit genetic architecture potentially under selection in a structured population http://www2.unil.ch/popgen/softwares/quantinemo/
RECOAL	Simulates new haplotype data from a reference population of haplotypes. ftp://popgen.usc.edu/
Recodon	Coalescent simulation of coding DNA sequences with recombination, migration and demography http://code.google.com/p/recodon/
rlsim	A package for simulating RNA-seq library preparation with parameter estimation http://bit.ly/rlsim-git
Rmetasim	Rmetasim is a front-end for the metasim engine that is implemented as a package that runs in the statistical computing environment R http://cran.r-project.org/web/packages/rmetasim/index.html
RNA Seq Simulator	RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets. http://useq.sourceforge.net/cmdlnmenus.html#rnaseqsimulator
Rose	Random model of sequence evolution http://bibiserv.techfak.uni-bielefeld.de/rose/
scrm	A coalescent simulator optimized for long sequences and large samples. https://scrm.github.io/
SelSim	SelSim is a program for Monte Carlo simulation of DNA polymorphism data for a recom- bining region within which a single bi-allelic site has experienced natural selection http://www.well.ox.ac.uk/~spencer/selsim/
Seq-Gen	An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees. http://tree.bio.ed.ac.uk/software/seqgen/
SEQPower	Statistical power analysis for sequence-based association studies http://bioinformatics.org/spower/
SeqSIMLA	SeqSIMLA can simulate sequence data with user-specified disease and quantitative trait models. Family or unrelated case-control data can be simulated. http://seqsimla.sourceforge.net/
Serial NetEvolve	A flexible utility for generating serially-sampled sequences along a tree or recombinant network http://biorg.cis.fiu.edu/sne/
SFS_CODE	SFS_CODE can perform forward population genetic simulations under a general Wright-Fisher model with arbitrary migration, demographic, selective, and mutational effects. http://sfscode.sourceforge.net/sfs_code/index/index.html
SIBSIM	Quantitative phenotype simulation in extended pedigrees http://sourceforge.net/projects/sibsim/
SimAdapt	A spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton. https://www.openabm.org/model/3137
SIMCOAL2	A coalescent program for the simulation of complex recombination patterns over large genomic regions under various demographic models http://cmpg.unibe.ch/software/simcoal2/
SimCopy	An R package simulating the evolution of copy number profiles along a tree. http://bit.ly/simcopy
SIMLA	SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies. http://dmpi.duke.edu/simla-simulation-software-version-32
SimPed	A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures http://bioinformatics.org/simped/
Simprot	A program to simulate protein evolution by substitution, insertion and deletion http://www.uhnresearch.ca/labs/tillier/software.htm#3
SimRare	Rare variant simulation and analysis tool http://code.google.com/p/simrare/
simuGWAS	A forward-time simulator that simulates realistic samples for genome-wide association studies. http://simupop.sourceforge.net/cookbook/simugwas
simuPOP	simuPOP is a general-purpose individual-based forward-time population genetics simulation environment. http://simupop.sourceforge.net/
SISSI	A software tool to generate data of related sequences along a given phylogeny, taking into account user defined system of neighbourhoods and instantaneous rate matrices. http://www.cibiv.at/software/sissi/
SMARTPOP	Simulating Mating Alliance as a Reproductive Tactic for Populations http://smartpop.sourceforge.net/
SNPsim	Coalescent simulation of hotspot recombination http://code.google.com/p/phylosoftware/
SPIP	SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user http://swfsc.noaa.gov/textblock.aspx?division=fed&id=3434
Splatche	Spatial and Temporal Coalescences in Heterogeneous Environment http://www.splatche.com/
srv	Simulator of Rare Varaints (srv) is a simulator for the simulation of the introduction and evolution of (rare) genetic variants. http://simupop.sourceforge.net/cookbook/simurarevariants
SUP	SLINK/FastSLINK utility program http://mlemire.freeshell.org/software.html
TreesimJ	A flexible, forward-time population genetic simulator http://code.google.com/p/treesimj/
Variant Simulation Tools	A simulation tool for post-GWAS genetic epidemiological studies using whole-genome or whole-exome next-gen sequencing data, with an emphasis on user-friendliness and reproducibility. http://varianttools.sourceforge.net/simulation/homepage
Vortex	VORTEX is an individual-based simulation model for population viability analysis (PVA). http://www.vortex9.org/vortex.html
Wessim	Whole Exome Sequencing SIMulator http://sak042.github.io/wessim/

mrFAST: Micro Read Fast Alignment Search Tool

Neel — Tue, 26 Apr 2016 03:50:06 -0500

mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked.

More at http://mrfast.sourceforge.net/manual.html

Address of the bookmark: http://mrfast.sourceforge.net/manual.html