<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36525?offset=80</link>
	<atom:link href="https://bioinformaticsonline.com/related/36525?offset=80" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37211/jbrowse-embeddable-genome-browser-built-completely-with-javascript-and-html5</guid>
	<pubDate>Fri, 29 Jun 2018 09:19:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37211/jbrowse-embeddable-genome-browser-built-completely-with-javascript-and-html5</link>
	<title><![CDATA[JBrowse: Embeddable genome browser built completely with JavaScript and HTML5]]></title>
	<description><![CDATA[JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5, with optional run-once data formatting tools written in Perl.

Headline Features:
Fast, smooth scrolling and zooming. Explore your genome with unparalleled speed.
Scales easily to multi-gigabase genomes and deep-coverage sequencing.
Quickly open and view data files on your computer without uploading them to any server.
Supports GFF3, BED, FASTA, Wiggle, BigWig, BAM, VCF (with either .tbi or .idx index), REST, and more.  BAM, BigBed, BigWig, and VCF data are displayed directly from chunks of the compressed binary files, no conversion needed.
Includes an optional “faceted” track selector (see demo) suitable for large installations with thousands of tracks.
Very light server resource requirements. In fact, JBrowse has no back-end server code, just tools for formatting data files to be read directly over HTTP. Serve huge datasets from a single low-cost cloud instance.
Can run as a stand-alone app on OSX and Windows using the Electron platform
Highly extensible plugin architecture, with a large plugin registry of existing examples here https://gmod.github.io/jbrowse-registry

https://jbrowse.org/<p>Address of the bookmark: <a href="https://github.com/GMOD/jbrowse" rel="nofollow">https://github.com/GMOD/jbrowse</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40531/shasta-long-read-assembler</guid>
	<pubDate>Tue, 14 Jan 2020 06:47:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40531/shasta-long-read-assembler</link>
	<title><![CDATA[Shasta long read assembler]]></title>
	<description><![CDATA[<p>The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using as input DNA reads generated by&nbsp;<a href="https://nanoporetech.com/">Oxford Nanopore</a>&nbsp;flow cells.</p>
<p>Computational methods used by the Shasta assembler include:</p>
<ul>
<li>Using a&nbsp;<a href="https://en.wikipedia.org/wiki/Run-length_encoding">run-length</a>&nbsp;representation of the read sequence. This makes the assembly process more resilient to errors in homopolymer repeat counts, which are the most common type of errors in Oxford Nanopore reads.</li>
<li>Using in some phases of the computation a representation of the read sequence based on&nbsp;<em>markers</em>, a fixed subset of short k-mers (k &asymp; 10).</li>
</ul>
<p>More at&nbsp;<a href="https://chanzuckerberg.github.io/shasta/index.html">https://chanzuckerberg.github.io/shasta/index.html</a></p><p>Address of the bookmark: <a href="https://github.com/chanzuckerberg/shasta" rel="nofollow">https://github.com/chanzuckerberg/shasta</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43791/comparative-genomics-visualisation-tools</guid>
	<pubDate>Thu, 17 Feb 2022 05:37:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43791/comparative-genomics-visualisation-tools</link>
	<title><![CDATA[Comparative genomics visualisation tools !]]></title>
	<description><![CDATA[<p>Comparative genomics visualisation tools !</p><p>Address of the bookmark: <a href="https://cmdcolin.github.io/awesome-genome-visualization/?latest=true&amp;selected=%23BRIG&amp;tag=Comparative" rel="nofollow">https://cmdcolin.github.io/awesome-genome-visualization/?latest=true&amp;selected=%23BRIG&amp;tag=Comparative</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/9028/linux-for-bioinformatician</guid>
	<pubDate>Thu, 13 Mar 2014 16:59:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/9028/linux-for-bioinformatician</link>
	<title><![CDATA[Linux for bioinformatician !!!]]></title>
	<description><![CDATA[<p>Linux, free operating system for computers, provides several powerful admin tools and utilities which will help you to manage your systems effectively and handle huge amount of genomic/biological data with an ease. The field of bioinformatics relies heavily on Linux-based computers and software. Although most bioinformatics programs can be compiled to run. If you don&rsquo;t know what these no so user-friendly tools are and how to use them, you could be spending lot of time trying to perform even the basic admin tasks. The focus of this linux series is to help you understand system admin as well as basic tools, which will help you to become an effective bioinformatician and computational biologist.<br /><br /></p><p>For knowledge about Linux and their importance amongst bioinformatician plesae read this article "<a href="http://www.ualberta.ca/~stothard/downloads/linux_for_bioinformatics.pdf">An introduction to Linux for bioinformatics</a>" by Paul Stothard.</p><p>Linux cheat sheet at http://bioinformaticsonline.com/file/view/87/linux-cheat-sheet</p><p>Please browse for futher useful linux pages on right hand side ...</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/9242/check-the-size-of-a-directory-free-disk-space</guid>
	<pubDate>Mon, 17 Mar 2014 02:35:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/9242/check-the-size-of-a-directory-free-disk-space</link>
	<title><![CDATA[Check the Size of a directory &amp; Free disk space.]]></title>
	<description><![CDATA[<p>The amount of databases we bioinformatician deal are just HUGE &hellip; In such cases, we always need to check our server for free spaces etc. I planned this article to explains 2 simple commands that most bioinformatician want to know when they start using Linux / BioLinux. First: Size of a directory (du) and and second: free disk space that exists on your machine (df).</p><p><br /><strong>'du' &ndash; Check the size of a directory</strong></p><p><br />$ du<br />This command ( du) gives you a list of directories that exist in the current working directory along with their sizes in kilobytes (default). The last line of the output gives you the total size of the current directory including its subdirectories. <br /><br />$ du /home/jin1<br />The above command would give you the directory size of the directory /home/david<br /><br />$ du -h<br />The same &ldquo;du&rdquo;command with some flag gives you a better output than the default one. The option '-h' stands for human readable format. Therefore, in order to print the sizes of the files / directories in your desire notation use this time suffixed with a 'k' if its kilobytes and 'M' if its Megabytes and 'G' if its Gigabytes.<br /><br />$ du -ah<br />If you are interested in checking everything present in a folder use above mentioned command. It gives us not only the directories but also all the files that are present in the current directory. The &ldquo;-a&rdquo; flag displays the filenames along with the directory names in the output. <br /><br />$ du -c<br />This gives you a grand total as the last line of the output. So if your directory occupies 30MB the last 2 lines of the output would be 30M.<br /><br />$ du -s<br />Use this command to displays a summary of the directory size. It is the simplest way to know the total size of the current directory.<br /><br />$ du -S<br />This would display the size of the current directory excluding the size of the subdirectories that exist within that directory. So it basically shows you the total size of all the files that exist in the current directory.<br /><br />$ du --exculde=mp3<br />Several times it required to exclude some directory in our size calculation. In such cases the above command would display the size of the current directory along with all its subdirectories, but it would exclude all the files having the given pattern present in their filenames.</p><p><br /><strong>'df' - finding the disk free space / disk usage</strong><br /><br />$ df<br />Hmmm &hellip; now &ldquo;df&rdquo; command is really useful, and I guess you are going to use it over time. Typing the above command, outputs a table consisting of 6 columns. All the columns are very easy to understand. Remember that the 'Size', 'Used' and 'Avail' columns use kilobytes as the unit. The 'Use%' column shows the usage as a percentage which is also very useful.<br /><br />$ df -h<br />Displays the same output as the previous command but the '-h' indicates human readable format. Hence instead of kilobytes as the unit the output would have 'M' for Megabytes and 'G' for Gigabytes.<br /><br />Example: Linux installed on /dev/hda1<br />$ df -h | grep /dev/hda1</p><p><br />All right, this is not the only option to check the sizes and free spaces but there are a few more options that can be used with 'du' and 'df' . I will discuss it later.<br /><br /></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/8265/list-of-generic-simulation-softwaretoolsresource-with-brief-description-and-homepage</guid>
	<pubDate>Mon, 10 Feb 2014 05:57:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/8265/list-of-generic-simulation-softwaretoolsresource-with-brief-description-and-homepage</link>
	<title><![CDATA[List of generic simulation software/tools/resource with brief description and homepage !!!]]></title>
	<description><![CDATA[<p>List of generic simulation software/tools/resource with brief description and homepage</p><p><img src="http://www.evolution-of-life.com/fileadmin/images/carousel/genetic.PNG" alt="image" style="border: 0px;"></p><p>ALF <br />A Simulation Framework for Genome Evolution <br />http://www.cbrg.ethz.ch/alf<br /><br />Bayesian Serial SimCoal <br />Bayesian Serial SimCoal, (BayeSSC) is a modification of SIMCOAL 1.0, a program written by Laurent Excoffier, John Novembre, and Stefan Schneider. <br />http://www.stanford.edu/group/hadlylab/ssc/index.html<br /><br />BEERS <br />BEERS was designed to benchmark RNA-Seq alignment algorithms and also algorithms that aim to reconstruct different isoforms and alternate splicing from RNA-Seq data <br />http://cbil.upenn.edu/beers/<br /><br />BOTTLENECK <br />Bottleneck is a program for detecting recent effective population size reductions from allele data frequencies <br />http://www.ensam.inra.fr/urlb/bottleneck/bottleneck.html<br /><br />BottleSim <br />BottleSim is a computer simulation program for simulating the process of population bottlenecks <br />http://chkuo.name/software/bottlesim.html<br /><br />CASS <br />Protein Sequence Simulation <br />http://www.wyomingbioinformatics.org/liberlesgroup/cass/<br /><br />CDPOP <br />CDPOP is a landscape genetics tool for simulating the emergence of spatial genetic structure in populations resulting from specified landscape processes governing organism movement behavior. <br />http://cel.dbs.umt.edu/cdpop<br /><br />CoalFace <br />CoalFace is a simulation of the coalescent process with the visual display of gene genealogies. <br />http://web.up.ac.za/default.asp?ipkcategoryid=3283<br /><br />CoaSim <br />CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models. <br />http://users-birc.au.dk/mailund/coasim/index.html<br /><br />cosi <br />The cosi package is written in C and is available as a tar file. <br />http://www.broadinstitute.org/~sfs/cosi/<br /><br />CS-PSeq-Gen <br />A program to simulate the evolution of protein sequences under the constraints of the information of a particular reconstructed phylogeny <br />http://bioserv.rpbs.univ-paris-diderot.fr/software/cs-pseq-gen.html<br /><br />DAWG <br />An application designed to simulate the evolution of recombinant DNA sequences in continuous time <br />http://scit.us/projects/dawg<br /><br />Easypop <br />EASYPOP is an individual based model intended to simulate datasets under a very broad range of conditions <br />http://www.unil.ch/dee/page36926_fr.html<br /><br />EggLib <br />EggLib is a C++/Python library and program package for evolutionary genetics and genomics. <br />http://egglib.sourceforge.net/<br /><br />EvolSimulator <br />A simulation test bed for hypotheses of genome evolution <br />http://acb.qfab.org/acb/evolsim/<br /><br />EvolveAGene <br />A realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions <br />http://bellinghamresearchinstitute.com/software/index.html<br /><br />fastsimcoal <br />A continuous-&not;‐time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios <br />http://cmpg.unibe.ch/software/fastsimcoal/<br /><br />FastSLINK <br />Simulation of Marker and Phenotype Data in Pedigrees <br />http://watson.hgen.pitt.edu/<br /><br />FFPopSim <br />C++/Python library for population genetics. <br />http://webdav.tuebingen.mpg.de/ffpopsim/<br /><br />FLUX SIMULATOR <br />The Flux Simulator aims at providing a deterministic in silico reproduction of the experimental pipelines for RNA-Seq, employing a minimal set of parameters. <br />http://flux.sammeth.net/simulator.html<br /><br />ForSim <br />ForSim: A Forward Evolutionary Computer Simulation <br />http://www.anthro.psu.edu/weiss_lab/research.shtml<br /><br />ForwSim <br />The program given below is based on the algorithm described in Padhukasahasram et al. 2008 to simulate genetic drift in a standard Wright-Fisher process. <br />http://badri-populationgeneticsimulators.blogspot.com/<br /><br />FPG <br />Forward Population Genetic simulation <br />http://genfaculty.rutgers.edu/hey/software#fpg<br /><br />FREGENE <br />FREGENE is a C++ program that simulates sequence-like data over large genomic regions in large diploid populations. <br />http://www.ebi.ac.uk/projects/bargen/download/fregen/documentation_html.html<br /><br />GAMETES <br />Genetic Architecture Model Emulator for Testing and Evaluating Software: Simulates complex SNP models with pure, strict epistatic interactions with n-loci. <br />http://sourceforge.net/projects/gametes/?source=navbar<br /><br />GASP <br />Genometric Analysis Simulation Program. A software tool for testing and investigating methods in statistical genetics by generating samples of family data based on user specified models. <br />http://research.nhgri.nih.gov/gasp/<br /><br />GemSIM <br />Next generation sequencing read simulator <br />http://sourceforge.net/projects/gemsim/<br /><br />GeneArtisan <br />Simulation of Markers in Case-Control Study Designs <br />http://www.rannala.org/?page_id=241<br /><br />GENOME <br />A rapid coalescent-based whole genome simulator <br />http://www.sph.umich.edu/csg/liang/genome/<br /><br />GenomePop2 <br />GenomePop2 is a specialization of the program GenomePop just to manage SNPs under more flexible and useful settings. If you need models with more than 2 alleles please use the GenomePop program version. <br />http://webs.uvigo.es/acraaj/genomepop2.htm<br /><br />GenomeSimla <br />GenomeSIMLA is currently under development- however, we have a beta release that we are asking to be tested <br />http://chgr.mc.vanderbilt.edu/genomesimla/<br /><br />GENS2 <br />Simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. <br />https://sourceforge.net/projects/gensim/<br /><br />GWAsimulator <br />A rapid whole genome simulation program <br />http://biostat.mc.vanderbilt.edu/wiki/main/gwasimulator<br /><br />HAP-SAMPLE <br />An association simulator for candidate regions or genome scans <br />http://www.hapsample.org/<br /><br />HAPGEN <br />A simulator for the simulation of case control datasets at SNP markers <br />https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html<br /><br />HapSim <br />A simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients <br />http://cran.r-project.org/web/packages/hapsim/index.html<br /><br />HAPSIMU <br />A program that simulates heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model <br />http://l.web.umkc.edu/liujian/<br /><br />IBDsim <br />IBDSim is a computer package for the simulation of genotypic data under general isolation by distance models. <br />http://raphael.leblois.free.fr/<br /><br />indel-Seq-Gen <br />A biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies <br />http://bioinfolab.unl.edu/~cstrope/isg/<br /><br />Indelible <br />A powerful and flexible simulator of biological evolution <br />http://abacus.gene.ucl.ac.uk/software/indelible/<br /><br />invertFREGENE <br />InvertFREGENE is a forward-in-time simulator of inversions in population genetic data <br />http://www.ebi.ac.uk/projects/bargen/<br /><br />kernalPop <br />A spatially explicit population genetic simulation engine <br />http://cran.r-project.org/src/contrib/archive/kernelpop/<br /><br />MaCS <br />Markovian Coalescent Simulator <br />http://www-hsc.usc.edu/~garykche/<br /><br />Mason <br />A package for the simulation of nucleotide data. <br />http://www.seqan.de/projects/mason/<br /><br />mbs <br />modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection <br />http://www.sendou.soken.ac.jp/esb/innan/innanlab/software.html<br /><br />Mendel's Accountant <br />Mendel's Accountant (MENDEL) is an advanced numerical simulation program for modeling genetic change over time and was developed collaboratively by Sanford, Baumgardner, Brewer, Gibson and ReMine <br />http://mendelsaccount.sourceforge.net/<br /><br />MetaSim <br />A tool to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets <br />http://ab.inf.uni-tuebingen.de/software/metasim/<br /><br />mlcoalsim <br />Multilocus Coalescent Simulations <br />http://code.google.com/p/mlcoalsim-v1/<br /><br />ms <br />The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets. <br />http://home.uchicago.edu/~rhudson1/source/mksamples.html<br /><br />msHOT <br />The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets. <br />http://home.uchicago.edu/~rhudson1/<br /><br />msms <br />A coalescent Simlation tool with selection. <br />http://www.mabs.at/ewing/msms/index.shtml<br /><br />MySSP <br />A program for the simulation of DNA sequence evolution across a phylogenetic tree <br />http://www.rosenberglab.net/software.php<br /><br />Nemo <br />A forward-time, individual-based, genetically explicit, and stochastic simulation program designed to study the evolution of genetic markers, life history traits, and phenotypic traits in a flexible (meta-)population framework. <br />http://nemo2.sourceforge.net/<br /><br />NetRecodon <br />Coalescent simulation of coding DNA sequences with recombination (inter and intracodon), migration and demography <br />http://code.google.com/p/netrecodon/<br /><br />PEDAGOG <br />Software for simulating eco-evolutionary population dynamics <br />https://bcrc.bio.umass.edu/pedigreesoftware/node/5<br /><br />phenosim <br />A tool to add phenotypes to simulated genotypes <br />http://evoplant.uni-hohenheim.de/doku.php?id=software:software<br /><br />PhyloSim <br />An R package for the Monte Carlo simulation of sequence evolution <br />http://bit.ly/rlsim-git<br /><br />pIRS <br />Profile-based Illumina pair-end reads simulator <br />https://code.google.com/p/pirs/<br /><br />ProteinEvolver <br />Simulation of protein evolution along phylogenies under structure-based substitution models <br />http://code.google.com/p/proteinevolver/<br /><br />QMSim <br />QTL and Marker Simulator <br />http://www.aps.uoguelph.ca/~msargol/qmsim/<br /><br />quantiNEMO <br />An individual-based program for the analysis of quantitative traits with explicit genetic architecture potentially under selection in a structured population <br />http://www2.unil.ch/popgen/softwares/quantinemo/<br /><br />RECOAL <br />Simulates new haplotype data from a reference population of haplotypes. <br />ftp://popgen.usc.edu/<br /><br />Recodon <br />Coalescent simulation of coding DNA sequences with recombination, migration and demography <br />http://code.google.com/p/recodon/<br /><br />rlsim <br />A package for simulating RNA-seq library preparation with parameter estimation <br />http://bit.ly/rlsim-git<br /><br />Rmetasim <br />Rmetasim is a front-end for the metasim engine that is implemented as a package that runs in the statistical computing environment R <br />http://linum.cofc.edu/software.html#metasim<br /><br />RNA Seq Simulator <br />RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets. <br />http://useq.sourceforge.net/cmdlnmenus.html#rnaseqsimulator<br /><br />Rose <br />Random model of sequence evolution <br />http://bibiserv.techfak.uni-bielefeld.de/rose/<br /><br />SelSim <br />SelSim is a program for Monte Carlo simulation of DNA polymorphism data for a recom- bining region within which a single bi-allelic site has experienced natural selection <br />http://www.well.ox.ac.uk/~spencer/selsim/<br /><br />Seq-Gen <br />An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees. <br />http://tree.bio.ed.ac.uk/software/seqgen/<br /><br />SEQPower <br />Statistical power analysis for sequence-based association studies <br />http://bioinformatics.org/spower/<br /><br />SeqSIMLA <br />SeqSIMLA can simulate sequence data with user-specified disease and quantitative trait models. Family or unrelated case-control data can be simulated. <br />http://seqsimla.sourceforge.net/<br /><br />Serial NetEvolve <br />A flexible utility for generating serially-sampled sequences along a tree or recombinant network <br />http://biorg.cis.fiu.edu/sne/<br /><br />SFS_CODE <br />SFS_CODE can perform forward population genetic simulations under a general Wright-Fisher model with arbitrary migration, demographic, selective, and mutational effects. <br />http://sfscode.sourceforge.net/sfs_code/index/index.html<br /><br />SIBSIM <br />Quantitative phenotype simulation in extended pedigrees <br />http://sourceforge.net/projects/sibsim/<br /><br />SIMCOAL2 <br />A coalescent program for the simulation of complex recombination patterns over large genomic regions under various demographic models <br />http://cmpg.unibe.ch/software/simcoal2/<br /><br />SimCopy <br />An R package simulating the evolution of copy number profiles along a tree. <br />http://bit.ly/simcopy<br /><br />SIMLA <br />SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies. <br />http://www.chg.duke.edu/research/simla.html<br /><br />SimPed <br />A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures <br />http://www.hgsc.bcm.tmc.edu/content/simped<br /><br />Simprot <br />A program to simulate protein evolution by substitution, insertion and deletion <br />http://www.uhnresearch.ca/labs/tillier/software.htm#3<br /><br />SimRare <br />Rare variant simulation and analysis tool <br />http://code.google.com/p/simrare/<br /><br />simuGWAS <br />A forward-time simulator that simulates realistic samples for genome-wide association studies. <br />http://simupop.sourceforge.net/cookbook/simucomplexdisease<br /><br />simuPOP <br />simuPOP is a general-purpose individual-based forward-time population genetics simulation environment. <br />http://simupop.sourceforge.net/<br /><br />SISSI <br />A software tool to generate data of related sequences along a given phylogeny, taking into account user defined system of neighbourhoods and instantaneous rate matrices. <br />http://www.cibiv.at/software/sissi/<br /><br />SNPsim <br />Coalescent simulation of hotspot recombination <br />http://code.google.com/p/phylosoftware/<br /><br />SPIP <br />SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user <br />http://swfsc.noaa.gov/textblock.aspx?division=fed&amp;id=3434<br /><br />Splatche <br />Spatial and Temporal Coalescences in Heterogeneous Environment <br />http://www.splatche.com/<br /><br />srv <br />Simulator of Rare Varaints (srv) is a simulator for the simulation of the introduction and evolution of (rare) genetic variants. <br />http://simupop.sourceforge.net/cookbook/simurarevariants<br /><br />SUP <br />SLINK/FastSLINK utility program <br />http://mlemire.freeshell.org/software.html<br /><br />TreesimJ <br />A flexible, forward-time population genetic simulator <br />http://code.google.com/p/treesimj/<br /><br />Vortex <br />VORTEX is an individual-based simulation model for population viability analysis (PVA). <br />http://www.vortex9.org/vortex.html<br /><br />References:</p><p>Image www.evolution-of-life.com</p><p>www.cancer.gov</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34418/spades-hybrid-genome-assembly</guid>
	<pubDate>Mon, 27 Nov 2017 08:05:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34418/spades-hybrid-genome-assembly</link>
	<title><![CDATA[SPAdes hybrid genome assembly]]></title>
	<description><![CDATA[<p>When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the&nbsp;<a href="https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0101-6">B fragilis assembly</a>&nbsp;by Mick Watson&rsquo;s group.</p><p>Again, running spades.py will show you the options:</p><div><pre><code>spades.py
</code></pre></div><p>This produces:</p><div><pre><code>SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o &lt;output_dir&gt;

Basic options:
-o      &lt;output_dir&gt;    directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12    &lt;filename&gt;      file with interlaced forward and reverse paired-end reads
-1      &lt;filename&gt;      file with forward paired-end reads
-2      &lt;filename&gt;      file with reverse paired-end reads
-s      &lt;filename&gt;      file with unpaired reads
--pe&lt;#&gt;-12      &lt;filename&gt;      file with interlaced reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-1       &lt;filename&gt;      file with forward reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-2       &lt;filename&gt;      file with reverse reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-s       &lt;filename&gt;      file with unpaired reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--pe&lt;#&gt;-&lt;or&gt;    orientation of reads for paired-end library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--s&lt;#&gt;          &lt;filename&gt;      file with unpaired reads for single reads library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-12      &lt;filename&gt;      file with interlaced reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-1       &lt;filename&gt;      file with forward reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-2       &lt;filename&gt;      file with reverse reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-s       &lt;filename&gt;      file with unpaired reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--mp&lt;#&gt;-&lt;or&gt;    orientation of reads for mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--hqmp&lt;#&gt;-12    &lt;filename&gt;      file with interlaced reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-1     &lt;filename&gt;      file with forward reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-2     &lt;filename&gt;      file with reverse reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-s     &lt;filename&gt;      file with unpaired reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--hqmp&lt;#&gt;-&lt;or&gt;  orientation of reads for high-quality mate-pair library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9; &lt;or&gt; = fr, rf, ff)
--nxmate&lt;#&gt;-1   &lt;filename&gt;      file with forward reads for Lucigen NxMate library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--nxmate&lt;#&gt;-2   &lt;filename&gt;      file with reverse reads for Lucigen NxMate library number &lt;#&gt; (&lt;#&gt; = 1,2,..,9)
--sanger        &lt;filename&gt;      file with Sanger reads
--pacbio        &lt;filename&gt;      file with PacBio reads
--nanopore      &lt;filename&gt;      file with Nanopore reads
--tslr  &lt;filename&gt;      file with TSLR-contigs
--trusted-contigs       &lt;filename&gt;      file with trusted contigs
--untrusted-contigs     &lt;filename&gt;      file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from  &lt;cp&gt;    restart run with updated options and from the specified check-point ('ec', 'as', 'k&lt;int&gt;', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset       &lt;filename&gt;      file with dataset description in YAML format
-t/--threads    &lt;int&gt;           number of threads
                                [default: 16]
-m/--memory     &lt;int&gt;           RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir       &lt;dirname&gt;       directory for temporary files
                                [default: &lt;output_dir&gt;/tmp]
-k              &lt;int,int,...&gt;   comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff    &lt;float&gt;         coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  &lt;33 or 64&gt;      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]
</code></pre></div><p>As you can see this is also a &ldquo;pipeline&rdquo; of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:</p><div><pre><code>spades.py -t 4 <span>\</span>
          -m 32 <span>\</span>
          -k 31,51,71 <span>\</span>
          --only-assembler <span>\</span>
          -1 miseq.1.fastq -2 miseq.2.fastq <span>\</span>
          --nanopore minion.fastq <span>\</span>
          -o hybrid_assembly
</code></pre></div><p>In turn, these parameters mean</p><ul>
<li>use 4 threads</li>
<li>max memory is 32Gb</li>
<li>use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71</li>
<li>only run the assembler, not the correction algorithm (for speed)</li>
<li>read 1 and read 2 of the MiSeq data</li>
<li>the nanopore data</li>
<li>put the output in folder &ldquo;hybrid_assembly&rdquo;</li>
</ul>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</guid>
	<pubDate>Tue, 19 Dec 2017 17:17:38 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/34707/string-graph-based-genome-assembly-software-and-tools</link>
	<title><![CDATA[String graph based genome assembly software and tools !]]></title>
	<description><![CDATA[<p>In&nbsp;<a href="https://en.wikipedia.org/wiki/Graph_theory" title="Graph theory">graph theory</a>, a&nbsp;<strong>string graph</strong>&nbsp;is an&nbsp;<a href="https://en.wikipedia.org/wiki/Intersection_graph" title="Intersection graph">intersection graph</a>&nbsp;of&nbsp;<a href="https://en.wikipedia.org/wiki/Curve" title="Curve">curves</a>&nbsp;in the plane; each curve is called a "string".&nbsp; String graphs were first proposed by E. W. Myers in a&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">2005 publication</a>.&nbsp;In&nbsp;recent&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Genome Research paper</a>&nbsp;describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the&nbsp;paper is coauthored by Jared Simpson, the developer of&nbsp;<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2694472/">ABySS assembler</a>&nbsp;and Richard Durbin. iii)&nbsp;Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called &lsquo;string graph&rsquo;.</p><p>Following are the genome assembly tools based on string graph:</p><p>1.SGA (String Graph Assembler)&nbsp;https://github.com/jts/sga</p><p>Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.</p><p>2.&nbsp;SAGE: String-overlap Assembly of GEnomes&nbsp;https://github.com/lucian-ilie/SAGE2</p><p>SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.</p><p>3. FSG: Fast String Graph</p><p>The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.</p><p>4.&nbsp;&nbsp;BASE&nbsp;https://github.com/dhlbh/BASE</p><p>It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.&nbsp;BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.</p><p>5.&nbsp;Fermi&nbsp;https://github.com/lh3/fermi/</p><p>Fermi is a de novo assembler with a particular focus on assembling Illumina&nbsp;short sequence reads from a mammal-sized genome. In addition to the role of a&nbsp;typical assembler, fermi also aims to preserve heterozygotes which are often&nbsp;collapsed by other assemblers. Its ultimate goal is to find a minimal set of&nbsp;unitigs to represent all the information in raw reads.</p><p>If you want to learn about String Graph assembler, please read the following papers -</p><p>i)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/21/suppl_2/ii79.full.pdf+html">The Fragment Assembly String Graph - E. W. Myers</a></p><p>This paper describes the String Graph concept.</p><p>ii)&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/26/12/i367.full#ref-20">Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin</a></p><p>This earlier paper from Simpson and Durbin</p><p>iii)&nbsp;<a href="http://genome.cshlp.org/content/early/2012/01/22/gr.126953.111">Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin</a></p><p>&nbsp;</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35762/genome-assembly-stats-plotting</guid>
	<pubDate>Wed, 28 Feb 2018 03:45:39 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35762/genome-assembly-stats-plotting</link>
	<title><![CDATA[Genome assembly stats plotting]]></title>
	<description><![CDATA[<p>A&nbsp;<em>de novo</em>&nbsp;genome assembly can be summarised b</p>
<p>y a number of metrics, including:</p>
<ul>
<li>Overall assembly length</li>
<li>Number of scaffolds/contigs</li>
<li>Length of longest scaffold/contig</li>
<li>Scaffold/contig N50 and N90Assembly base composition, in particular percentage GC and percentage Ns</li>
<li>CEGMA completeness</li>
<li>Scaffold/contig length/count distribution</li>
</ul>
<p>assembly-stats supports two widely used presentations of these values, tabular and cumulative length plots, and introduces an additional circular plot that summarises most commonly used assembly metrics in a single visualisation. Each of these presentations is generated using javascript from a common (JSON) data structure, allowing toggling between alternative views, and each can be applied to a single or multiple assemblies to allow direct comparison of alternate assemblies.</p>
<p>Tabular presentation allows direct comparison of exact values between assemblies, the limitations of this approach lie in the necessary omission of distributions and the challenge of interpreting ratios of values that may vary by several orders of magnitude.</p><p>Address of the bookmark: <a href="https://github.com/rjchallis/assembly-stats" rel="nofollow">https://github.com/rjchallis/assembly-stats</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36935/assemblytics-delta-file-to-analyze-alignments-of-an-assembly-to-another-assembly-or-a-reference-genome</guid>
	<pubDate>Thu, 14 Jun 2018 07:31:00 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36935/assemblytics-delta-file-to-analyze-alignments-of-an-assembly-to-another-assembly-or-a-reference-genome</link>
	<title><![CDATA[assemblytics: delta file to analyze alignments of an assembly to another assembly or a reference genome]]></title>
	<description><![CDATA[Download and install MUMmer
Align your assembly to a reference genome using nucmer (from MUMmer package)
$ nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT
Consult the MUMmer manual if you encounter problems

Optional: Gzip the delta file to speed up upload (usually 2-4X faster)
$ gzip OUT.delta
Then use the OUT.delta.gz file for upload.
Upload the .delta or delta.gz file (view example) to Assemblytics
Important: Use only contigs rather than scaffolds from the assembly. This will prevent false positives when the number of Ns in the scaffolded sequence does not match perfectly to the distance in the reference.

The unique sequence length required represents an anchor for determining if a sequence is unique enough to safely call variants from, which is an alternative to the mapping quality filter for read alignment.

http://assemblytics.com/<p>Address of the bookmark: <a href="http://assemblytics.com/" rel="nofollow">http://assemblytics.com/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>