<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44503?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/44503?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43736/odgi-optimized-dynamic-genomegraph-implementation</guid>
	<pubDate>Tue, 01 Feb 2022 23:42:21 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43736/odgi-optimized-dynamic-genomegraph-implementation</link>
	<title><![CDATA[odgi: optimized dynamic genome/graph implementation]]></title>
	<description><![CDATA[<p dir="auto"><code>odgi</code>&nbsp;provides an efficient and succinct dynamic DNA sequence graph model, as well as a host of algorithms that allow the use of such graphs in bioinformatic analyses.</p>
<p dir="auto">Careful encoding of graph entities allows&nbsp;<code>odgi</code>&nbsp;to efficiently compute and transform&nbsp;<a href="https://pangenome.github.io/">pangenomes</a>&nbsp;with minimal overheads.&nbsp;<code>odgi</code>&nbsp;implements a dynamic data structure that leveraged multi-core CPUs and can be updated on the fly.</p>
<p dir="auto">The edges and path steps are recorded as deltas between the current node id and the target node id, where the node id corresponds to the rank in the global array of nodes. Graphs built from biological data sets tend to have local partial order and, when sorted, the deltas be small. This allows them to be compressed with a variable length integer representation, resulting in a small in-memory footprint at the cost of packing and unpacking.</p>
<p dir="auto">The RAM and computational savings are substantial. In partially ordered regions of the graph, most deltas will require only a single byte.</p><p>Address of the bookmark: <a href="https://github.com/pangenome/odgi" rel="nofollow">https://github.com/pangenome/odgi</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43806/genomicus-genome-browser-that-enables-users-to-navigate-in-genomes-in-several-dimensions</guid>
	<pubDate>Mon, 28 Feb 2022 23:27:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43806/genomicus-genome-browser-that-enables-users-to-navigate-in-genomes-in-several-dimensions</link>
	<title><![CDATA[Genomicus: genome browser that enables users to navigate in genomes in several dimensions]]></title>
	<description><![CDATA[<p>Genomicus is a genome browser that enables users to navigate in genomes in several dimensions: linearly along chromosome axes, transversaly across different species, and chronologicaly along evolutionary time.</p>
<p>Once a query gene has been entered, it is displayed in its genomic context in parallel to the genomic context of all its orthologous and paralogous copies in all the other sequenced metazoan genomes. Moreover, Genomicus stores and displays the predicted ancestral genome structure in all the ancestral species within the phylogenetic range of interest.</p>
<p>All the data on extant species displayed in this browser are from&nbsp;<a href="http://www.ensembl.org/">Ensembl</a>.</p>
<p><br><strong>Summary statistics of Genomicus version 105.01:</strong><span>&nbsp;(view species tree in&nbsp;</span><a href="https://www.genomicus.bio.ens.psl.eu/genomicus-105.01/data/SpeciesTree.pdf">pdf</a><span>&nbsp;or&nbsp;</span><a href="https://www.genomicus.bio.ens.psl.eu/genomicus-105.01/data/SpeciesTree.nwk">newick</a><span>)</span><br><br></p>
<table id="introstats">
<tbody>
<tr><th>Number of extant species</th>
<td>200</td>
</tr>
<tr><th>Number of extant genes</th>
<td>4303993</td>
</tr>
<tr><th>&nbsp;</th></tr>
<tr><th>Number of ancestral species</th>
<td>196</td>
</tr>
<tr><th>Number of ancestral genes</th>
<td>4624213</td>
</tr>
<tr><th>Number of ancestral synteny blocks</th>
<td>83342<br><br></td>
</tr>
</tbody>
</table><p>Address of the bookmark: <a href="https://www.genomicus.bio.ens.psl.eu/genomicus-105.01/cgi-bin/search.pl" rel="nofollow">https://www.genomicus.bio.ens.psl.eu/genomicus-105.01/cgi-bin/search.pl</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/view/982</guid>
	<pubDate>Wed, 17 Jul 2013 15:25:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/view/982</link>
	<title><![CDATA[Is reference genome necessary for gene expression study in transcriptome sequencing or for variant discovery in genome sequencing?]]></title>
	<description><![CDATA[<p><span>Like in case of plant genomes where nature of genome is too complex and huge in size to accomplish complete<em> de novo</em> assembly by current sequencing technology. What would be alternate solution? Can we live in reference free world?</span></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/4183/320000-viruses-in-mammals-yet-to-sequenced-in-future</guid>
	<pubDate>Tue, 03 Sep 2013 08:35:30 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/4183/320000-viruses-in-mammals-yet-to-sequenced-in-future</link>
	<title><![CDATA[320000 viruses in mammals yet to sequenced in future!!!]]></title>
	<description><![CDATA[<p>With current biological technique improvements, finally it is now possible to look at millions of unknown viruses at genomic level and understand the mechanism. According to available data, close to 70 per cent of emerging viral diseases such as HIV/AIDS, West Nile, Ebola, SARS, and influenza, are zoonoses - infections of animals that cross into humans.</p><p>To address the challenges of describing and estimating virodiversity, a team of investigators from Center for Infection and Immunity (CII) and EcoHealth Alliance began in jungles of Bangladesh - home to the flying fox.</p><p>Reference:</p><p><a href="http://economictimes.indiatimes.com/news/news-by-industry/et-cetera/mammals-harbour-at-least-320000-new-viruses/articleshow/22253268.cms">http://economictimes.indiatimes.com/news/news-by-industry/et-cetera/mammals-harbour-at-least-320000-new-viruses/articleshow/22253268.cms</a></p><p><a href="http://www.bbc.co.uk/news/science-environment-23932400">http://www.bbc.co.uk/news/science-environment-23932400</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32948/simba-a-web-tool-for-managing-bacterial-genome-assembly-generated-by-ion-pgm-sequencing-technology</guid>
	<pubDate>Tue, 23 May 2017 05:28:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32948/simba-a-web-tool-for-managing-bacterial-genome-assembly-generated-by-ion-pgm-sequencing-technology</link>
	<title><![CDATA[SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology]]></title>
	<description><![CDATA[<p><span>SIMBA</span><span>, SImple Manager for Bacterial Assemblies, is a Web interface for managing assembly projects of bacterial genomes. SIMBA was created to assist bioinformaticians to assemble bacterial genomes sequenced with NextGeneration Sequencing (NGS) platforms quickly, easily and effectively. SIMBA also is open source tool, i.e., can be freely downloaded, shared and modified.</span></p>
<p>https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1344-7</p><p>Address of the bookmark: <a href="http://ufmg-simba.sourceforge.net/" rel="nofollow">http://ufmg-simba.sourceforge.net/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34328/dfast-a-flexible-prokaryotic-genome-annotation-pipeline-for-faster-genome-publication</guid>
	<pubDate>Tue, 14 Nov 2017 10:26:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34328/dfast-a-flexible-prokaryotic-genome-annotation-pipeline-for-faster-genome-publication</link>
	<title><![CDATA[DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication]]></title>
	<description><![CDATA[<p>We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.</p>
<div>Availability and Implementation</div>
<p>The software is implemented in Python 3 and runs in both Python 2.7 and 3.4&ndash; on Macintosh and Linux systems. It is freely available at&nbsp;<a href="https://github.com/nigyta/dfast_core/" target="">https://github.com/nigyta/dfast_core/</a>&nbsp;under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at&nbsp;<a href="https://dfast.nig.ac.jp/" target="">https://dfast.nig.ac.jp/</a>.</p><p>Address of the bookmark: <a href="https://dfast.nig.ac.jp/" rel="nofollow">https://dfast.nig.ac.jp/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36635/circlator-automated-circularization-of-genome-assemblies-using-long-sequencing-reads</guid>
	<pubDate>Tue, 15 May 2018 09:42:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36635/circlator-automated-circularization-of-genome-assemblies-using-long-sequencing-reads</link>
	<title><![CDATA[Circlator: automated circularization of genome assemblies using long sequencing reads]]></title>
	<description><![CDATA[A tool to circularize genome assemblies. The algorithm and benchmarks are described in the Genome Biology manuscript. 

Citation: "Circlator: automated circularization of genome assemblies using long sequencing reads", Hunt et al, Genome Biology 2015 Dec 29;16(1):294. doi: 10.1186/s13059-015-0849-0. PMID: 26714481.<p>Address of the bookmark: <a href="http://sanger-pathogens.github.io/circlator/" rel="nofollow">http://sanger-pathogens.github.io/circlator/</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36954/mscaffolder-a-comparative-genome-scaffolding-tool</guid>
	<pubDate>Fri, 15 Jun 2018 04:48:01 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36954/mscaffolder-a-comparative-genome-scaffolding-tool</link>
	<title><![CDATA[mScaffolder: A comparative genome scaffolding tool]]></title>
	<description><![CDATA[<p>A comparative genome scaffolding tool based on MUMmer</p>
<p>mScaffolder scaffolds a genome using an existing high quality genome as the reference. It aligns the two genomes using nucmer utility from MUMmer and then orders and orients the contigs of the candidate genome guided by their alignments to the reference genome. Please send your questions and comments to&nbsp;<a href="mailto:mchakrab@uci.edu">mchakrab@uci.edu</a>.</p>
<p><span>Citation</span><span>&nbsp;</span><a href="https://www.nature.com/articles/s41588-017-0010-y">https://www.nature.com/articles/s41588-017-0010-y</a></p><p>Address of the bookmark: <a href="https://github.com/mahulchak/mscaffolder" rel="nofollow">https://github.com/mahulchak/mscaffolder</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38224/novograph-building-whole-genome-graphs-from-long-read-based-de-novo-assemblies</guid>
	<pubDate>Thu, 15 Nov 2018 12:48:30 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38224/novograph-building-whole-genome-graphs-from-long-read-based-de-novo-assemblies</link>
	<title><![CDATA[NovoGraph: building whole genome graphs from long-read-based de novo assemblies]]></title>
	<description><![CDATA[<p><span>NovoGraph: building whole genome graphs from long-read-based de novo assemblies</span></p>
<p><span><span>An algorithmically novel approach to construct a genome graph representation of long-read-based&nbsp;</span><em>de novo</em><span>&nbsp;sequence assemblies. We then provide a proof of principle by creating a genome graph of seven ethnically-diverse human genomes.</span></span></p>
<p>&nbsp;</p>
<p>https://f1000research.com/articles/7-1391/v1</p><p>Address of the bookmark: <a href="https://github.com/NCBI-Hackathons/NovoGraph" rel="nofollow">https://github.com/NCBI-Hackathons/NovoGraph</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>