<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/34413?offset=470</link>
	<atom:link href="https://bioinformaticsonline.com/related/34413?offset=470" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26925/reapr-a-universal-tool-for-genome-assembly-evaluation</guid>
	<pubDate>Wed, 06 Apr 2016 18:26:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26925/reapr-a-universal-tool-for-genome-assembly-evaluation</link>
	<title><![CDATA[REAPR: a universal tool for genome assembly evaluation]]></title>
	<description><![CDATA[<p>REAPR is a tool that evaluates the accuracy of a genome assembly using mapped paired end reads, without the use of a reference genome for comparison. It can be used in any stage of an assembly pipeline to automatically break incorrect scaffolds and flag other errors in an assembly for manual inspection. It reports mis-assemblies and other warnings, and produces a new broken assembly based on the error calls.</p>
<p>The software requires as input an assembly in FASTA format and paired reads mapped to the assembly in a BAM file. Mapping information such as the fragment coverage and insert size distribution is analysed to locate mis-assemblies. REAPR works best using mapped read pairs from a large insert library (at least 1000bp). Additionally, if a short insert Illumina library is also available, REAPR can combine this with the large insert library in order to score each base of the assembly.</p>
<p>http://www.sanger.ac.uk/science/tools/reapr</p><p>Address of the bookmark: <a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-5-r47" rel="nofollow">https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-5-r47</a></p>]]></description>
	<dc:creator>Jitendra Prajapati</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27257/busco-assessing-genome-assembly-and-annotation-completeness-with-benchmarking-universal-single-copy-orthologs</guid>
	<pubDate>Tue, 10 May 2016 07:46:24 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27257/busco-assessing-genome-assembly-and-annotation-completeness-with-benchmarking-universal-single-copy-orthologs</link>
	<title><![CDATA[BUSCO: Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs]]></title>
	<description><![CDATA[<ul>
<li><span>High-throughput genomics has revolutionized biological research, however, while the number of sequenced genomes grows by the day, quality assessment of the resulting assembled sequences remains complicated and mostly limited to technical measures like N50.&nbsp;</span></li>
<li></li>
<li><span>BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from&nbsp;</span><a href="http://orthodb.org/">OrthoDB</a><span>.&nbsp;</span></li>
<li></li>
<li><span>BUSCO assessments are implemented in open-source software, with comprehensive lineage-specific sets of Benchmarking Universal Single-Copy Orthologs for arthropods, vertebrates, metazoans, fungi, eukaryotes, and bacteria.&nbsp;</span></li>
<li></li>
<li><span>These conserved orthologs are ideal candidates for large-scale phylogenomics studies, and the annotated BUSCO gene models built during genome assessments provide a comprehensive gene predictor training set for use as part of genome annotation pipelines.&nbsp;</span></li>
<li></li>
<li><span>BUSCO assessments offer intuitive metrics, based on evolutionarily informed expectations of gene content from hundreds of species, to gauge completeness of rapidly accumulating genomic data and satisfy an Iberian's quest for quality - "Busco calidad/qualidade".</span></li>
</ul><p>Address of the bookmark: <a href="http://busco.ezlab.org/" rel="nofollow">http://busco.ezlab.org/</a></p>]]></description>
	<dc:creator>Anjana</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27328/platanus</guid>
	<pubDate>Fri, 13 May 2016 05:12:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27328/platanus</link>
	<title><![CDATA[Platanus]]></title>
	<description><![CDATA[<p>Platanus is a novel <em>de novo</em> sequence assembler that can reconstruct genomic sequences of<br> highly heterozygous diploids from massively parallel shotgun sequencing data.</p>
<p>The latest version is <a href="http://platanus.bio.titech.ac.jp/platanus/?page_id=14">1.2.4</a>.</p>
<p>To cite Platanus, please use the following:</p>
<p>Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, Yabana M, Harada M, Nagayasu E, Maruyama H, Kohara Y, Fujiyama A, Hayashi T, Itoh T, &ldquo;Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads&rdquo;.&nbsp;Genome Res. 2014 Aug;24(8):1384-95. doi: 10.1101/gr.170720.113. [<a href="http://www.ncbi.nlm.nih.gov/pubmed/24755901">abstract</a> |<a href="http://genome.cshlp.org/content/24/8/1384.long"> full text</a>]</p><p>Address of the bookmark: <a href="http://platanus.bio.titech.ac.jp/" rel="nofollow">http://platanus.bio.titech.ac.jp/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30140/cutadapt</guid>
	<pubDate>Wed, 14 Dec 2016 09:59:52 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30140/cutadapt</link>
	<title><![CDATA[Cutadapt]]></title>
	<description><![CDATA[<p>Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.</p>
<p>Cutadapt helps with these trimming tasks by finding the adapter or primer sequences in an error-tolerant way. It can also modify and filter reads in various ways. Adapter sequences can contain IUPAC wildcard characters. Also, paired-end reads and even colorspace data is supported. If you want, you can also just demultiplex your input data, without removing adapter sequences at all.</p>
<p>Cutadapt comes with an extensive suite of automated tests and is available under the terms of the MIT license.</p>
<p>If you use cutadapt, please cite&nbsp;<a href="http://dx.doi.org/10.14806/ej.17.1.200">DOI:10.14806/ej.17.1.200</a>&nbsp;.</p>
<p>More at&nbsp;https://github.com/marcelm/cutadapt</p><p>Address of the bookmark: <a href="http://cutadapt.readthedocs.io/en/stable/guide.html" rel="nofollow">http://cutadapt.readthedocs.io/en/stable/guide.html</a></p>]]></description>
	<dc:creator>Bulbul</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32633/a-post-assembly-genome-improvement-toolkit-pagit-to-obtain-annotated-genomes-from-contigs</guid>
	<pubDate>Fri, 12 May 2017 10:50:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32633/a-post-assembly-genome-improvement-toolkit-pagit-to-obtain-annotated-genomes-from-contigs</link>
	<title><![CDATA[A Post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs]]></title>
	<description><![CDATA[<p>PAGIT addresses the need for software to generate high quality draft genomes. It is based on a series of programs that we developed:</p>
<p><a href="https://sourceforge.net/projects/abacas/files/">ABACAS</a>, that is able to contiguate contigs from a de novo assembly against a closely related reference.</p>
<p><a href="https://sourceforge.net/projects/image2/files/">IMAGE</a>, an iterative approach for closing gaps in assembled genomes using mate pair information. It is able to close gaps left open by the assembler in a draft genome, even when using the same data sets as used by the original assembler.</p>
<p><a href="http://icorn.sourceforge.net/">iCORN</a>, that enables errors in the consensus sequence to be corrected by iteratively mapping reads to the current assembly. An improved version, especially correction Pacfic Bioscience assemblies (PacBio) can be found&nbsp;<a href="ftp://ftp.sanger.ac.uk/pub4/resources/software/pagit/ICORN2/icorn2.V0.95.tgz">here</a>.</p>
<p><a href="https://ratt.svn.sourceforge.net/svnroot/ratt">RATT</a>, a tool to transfer the annotation from a reference genome, or an earlier assembly, onto the latest assembly.</p>
<p>PAGIT bundles these software and makes them more accessible for users.</p><p>Address of the bookmark: <a href="http://www.sanger.ac.uk/science/tools/pagit" rel="nofollow">http://www.sanger.ac.uk/science/tools/pagit</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35540/hinge-long-read-assembly-achieves-optimal-repeat-resolution</guid>
	<pubDate>Wed, 07 Feb 2018 09:40:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35540/hinge-long-read-assembly-achieves-optimal-repeat-resolution</link>
	<title><![CDATA[HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution]]></title>
	<description><![CDATA[<p>Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"</p>
<ul>
<li>
<p>Preprint:&nbsp;<a href="http://biorxiv.org/content/early/2016/08/01/062117">http://biorxiv.org/content/early/2016/08/01/062117</a></p>
</li>
<li>
<p>Paper:&nbsp;<a href="http://genome.cshlp.org/content/27/5/747.full">http://genome.cshlp.org/content/27/5/747.full</a></p>
</li>
<li>
<p>An ipython notebook to reproduce results in the paper can be found in this&nbsp;<a href="https://github.com/govinda-kamath/HINGE-analyses">repository</a>.</p>
</li>
</ul>
<p>HINGE is an OLC(Overlap-Layout-Consensus) assembler. The idea of the pipeline is shown below.</p>
<p><a href="https://github.com/HingeAssembler/HINGE/blob/master/misc/High_level_overview.png" target="_blank"><img src="https://github.com/HingeAssembler/HINGE/raw/master/misc/High_level_overview.png" alt="image" style="border: 0px;"></a></p><p>Address of the bookmark: <a href="https://github.com/HingeAssembler/HINGE" rel="nofollow">https://github.com/HingeAssembler/HINGE</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36514/evidentialgene-tr2aacds-mrna-transcript-assembly-software</guid>
	<pubDate>Tue, 08 May 2018 04:39:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36514/evidentialgene-tr2aacds-mrna-transcript-assembly-software</link>
	<title><![CDATA[EvidentialGene: tr2aacds, mRNA Transcript Assembly Software]]></title>
	<description><![CDATA[<p><span>EvidentialGene is a genome informatics project, "Evidence Directed Gene Construction for Eukaryotes", to construct high quality, accurate gene sets for animals and plants, developed by Don Gilbert at Indiana University, see</span><br><a href="http://arthropods.eugenes.org/EvidentialGene/" target="_blank">http://arthropods.eugenes.org/EvidentialGene/<span></span></a><br><br><span>Construction refers to the combination of classical gene prediction, and more recent gene assembly (de-novo and genome-assisted) methods. The basic Evigene methods involve using available best-of-breed gene prediction and assembly software, combining all evidence for genes, from expressed sequences, genome assembly sequences, related species protein sequences, and any other, to annotate and score gene constructions. Over-produced constructions are classified by gene evidence for best qualities per "locus", including genome-aligned and gene-transcript aligned (genome-free) locus identification. All software developed for EvidentialGene is publicly available. See project wiki/blog for notes.</span></p>
<p><span>Download&nbsp;</span></p>
<p>http://arthropods.eugenes.org/EvidentialGene/trassembly.html</p>
<p>https://sourceforge.net/p/evidentialgene/blog/</p><p>Address of the bookmark: <a href="http://arthropods.eugenes.org/EvidentialGene/trassembly.html" rel="nofollow">http://arthropods.eugenes.org/EvidentialGene/trassembly.html</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/42626/spades-team-announce-new-version-spades-v315</guid>
	<pubDate>Fri, 15 Jan 2021 10:24:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/42626/spades-team-announce-new-version-spades-v315</link>
	<title><![CDATA[SPADes team announce new version SPADes v3.15]]></title>
	<description><![CDATA[<p>New SPAdes 3.15.0.0. announced by the SPADes team This release includes such new features as:&nbsp;<br />- CoronaSPAdes pipeline for the assembly of transcriptomic and metatranscriptomic data of full-length coronaviridae genomes;&nbsp;<br />- Meta-Viral and RNA-Viral pipelines for metagenomic and metatranscriptomic data defining viral genomes;&nbsp;<br />-New trusted contiguous use algorithm;&nbsp;<br />-Switched to the memory allocator mimalloc;&nbsp;<br />- PlasmidSPAdes and bgcSPAdes are now provided as an input assembly graph;&nbsp;<br />- Important improvements and corrections to the metaplasmid pipeline;&nbsp;<br />- Multiple performance improvements in procedures for simplification and repeat resolving.&nbsp;<br />Please, consider updating.</p><p>Check out more at&nbsp;https://cab.spbu.ru/software/spades/</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44559/metagraph-ultra-scalable-framework-for-dna-search-alignment-assembly</guid>
	<pubDate>Sat, 08 Jun 2024 16:15:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44559/metagraph-ultra-scalable-framework-for-dna-search-alignment-assembly</link>
	<title><![CDATA[MetaGraph: Ultra Scalable Framework for DNA Search, Alignment, Assembly]]></title>
	<description><![CDATA[<p><span>The MetaGraph framework</span><span>&nbsp;is designed to work with a wide range of input data sets, indexing from a few samples up to the contents of entire archives with hundreds of thousands of records. The indexing workflow always follows the same principle, transforming single input samples into error-removed, refined sample graphs, which are then merged into a joint metagraph index. Each input sample is annotated in the joint index as a subgraph. This graph index enriched with metadata can then be used for downstream applications such as&nbsp;</span><a href="https://metagraph.ethz.ch/#query">sequence search</a><span>&nbsp;or&nbsp;</span><a href="https://metagraph.ethz.ch/#assembly">differential assembly</a><span>.</span></p>
<p><span>Searcg link&nbsp;https://metagraph.ethz.ch/search&nbsp;</span></p>
<p><span>Pre-print&nbsp;https://www.biorxiv.org/content/10.1101/2020.10.01.322164v4&nbsp;</span></p><p>Address of the bookmark: <a href="https://metagraph.ethz.ch/" rel="nofollow">https://metagraph.ethz.ch/</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>