<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43614?offset=280</link>
	<atom:link href="https://bioinformaticsonline.com/related/43614?offset=280" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30012/swalo</guid>
	<pubDate>Wed, 30 Nov 2016 05:06:05 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30012/swalo</link>
	<title><![CDATA[SWALO]]></title>
	<description><![CDATA[<p>SWALO (scaffolding with assembly likelihood optimization) is a method for scaffolding based on likelihood of genome assemblies computed using generative models for sequencing.</p>
<p><a href="https://atifrahman.github.io/SWALO/swalo-0.9.7-beta.tar.gz"><strong>Download</strong></a></p>
<p><strong>Git repository of SWALO is at <a href="https://github.com/atifrahman/SWALO">https://github.com/atifrahman/SWALO</a>.</strong></p><p>Address of the bookmark: <a href="https://atifrahman.github.io/SWALO/" rel="nofollow">https://atifrahman.github.io/SWALO/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33856/assembly-course</guid>
	<pubDate>Mon, 10 Jul 2017 09:38:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33856/assembly-course</link>
	<title><![CDATA[Assembly Course]]></title>
	<description><![CDATA[<p>https://ocw.mit.edu/courses/biology/7-91j-foundations-of-computational-and-systems-biology-spring-2014/lecture-slides/MIT7_91JS14_Lecture6.pdf</p><p>Address of the bookmark: <a href="https://ocw.mit.edu/courses/biology/7-91j-foundations-of-computational-and-systems-biology-spring-2014/lecture-slides/MIT7_91JS14_Lecture6.pdf" rel="nofollow">https://ocw.mit.edu/courses/biology/7-91j-foundations-of-computational-and-systems-biology-spring-2014/lecture-slides/MIT7_91JS14_Lecture6.pdf</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36893/beap-blast-extension-and-assembly-program</guid>
	<pubDate>Mon, 11 Jun 2018 04:52:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36893/beap-blast-extension-and-assembly-program</link>
	<title><![CDATA[BEAP: Blast Extension and Assembly Program]]></title>
	<description><![CDATA[The Blast Extension and Assembly Program (BEAP) is a computer program that uses a short starting DNA fragment, often a EST or partial gene segment, as "primer", to recursively blast nucleotide databases in an attempt to obtain all sequences that overlaps, directly or indirectly, with the "primer" therefore help to "extend" the length of the original sequence for constructing a "full length" sequence for functional analysis, or at least to obtain neighboring regions of the segment for SNP discovery and linkage disequilibrium analysis. The confidence of assembling the resulting sequences is achieved by using a known genome, such as human genome, as a reference.
 
https://www.animalgenome.org/tools/beap/<p>Address of the bookmark: <a href="https://www.animalgenome.org/tools/beap/" rel="nofollow">https://www.animalgenome.org/tools/beap/</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44559/metagraph-ultra-scalable-framework-for-dna-search-alignment-assembly</guid>
	<pubDate>Sat, 08 Jun 2024 16:15:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44559/metagraph-ultra-scalable-framework-for-dna-search-alignment-assembly</link>
	<title><![CDATA[MetaGraph: Ultra Scalable Framework for DNA Search, Alignment, Assembly]]></title>
	<description><![CDATA[<p><span>The MetaGraph framework</span><span>&nbsp;is designed to work with a wide range of input data sets, indexing from a few samples up to the contents of entire archives with hundreds of thousands of records. The indexing workflow always follows the same principle, transforming single input samples into error-removed, refined sample graphs, which are then merged into a joint metagraph index. Each input sample is annotated in the joint index as a subgraph. This graph index enriched with metadata can then be used for downstream applications such as&nbsp;</span><a href="https://metagraph.ethz.ch/#query">sequence search</a><span>&nbsp;or&nbsp;</span><a href="https://metagraph.ethz.ch/#assembly">differential assembly</a><span>.</span></p>
<p><span>Searcg link&nbsp;https://metagraph.ethz.ch/search&nbsp;</span></p>
<p><span>Pre-print&nbsp;https://www.biorxiv.org/content/10.1101/2020.10.01.322164v4&nbsp;</span></p><p>Address of the bookmark: <a href="https://metagraph.ethz.ch/" rel="nofollow">https://metagraph.ethz.ch/</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34377/genomicus-genome-browser-that-enables-users-to-navigate-in-genomes-in-several-dimensions</guid>
	<pubDate>Sat, 18 Nov 2017 16:10:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34377/genomicus-genome-browser-that-enables-users-to-navigate-in-genomes-in-several-dimensions</link>
	<title><![CDATA[Genomicus: genome browser that enables users to navigate in genomes in several dimensions]]></title>
	<description><![CDATA[<p>Genomicus is a genome browser that enables users to navigate in genomes in several dimensions: linearly along chromosome axes, transversaly across different species, and chronologicaly along evolutionary time.</p>
<p>Once a query gene has been entered, it is displayed in its genomic context in parallel to the genomic context of all its orthologous and paralogous copies in all the other sequenced metazoan genomes. Moreover, Genomicus stores and displays the predicted ancestral genome structure in all the ancestral species within the phylogenetic range of interest.</p>
<p>All the data on extant species displayed in this browser are from&nbsp;<a href="http://www.ensembl.org/">Ensembl</a>.</p><p>Address of the bookmark: <a href="http://genomicus.biologie.ens.fr/genomicus-90.01/cgi-bin/search.pl" rel="nofollow">http://genomicus.biologie.ens.fr/genomicus-90.01/cgi-bin/search.pl</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</guid>
	<pubDate>Fri, 08 Dec 2017 16:26:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</link>
	<title><![CDATA[jobTree based python wrapper to run the genome simulation tool suite Evolver]]></title>
	<description><![CDATA[<p><span>evolverSimControl</span><span>&nbsp;(</span><span>eSC</span><span>) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (</span><a href="http://evolution.genetics.washington.edu/phylip/newicktree.html">Newick format</a><span>). In addition to simply running evolver,&nbsp;</span><span>eSC</span><span>&nbsp;also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (</span><a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format5">.maf</a><span>) from leaf and branch nodes from a completed simulation and with the help of&nbsp;</span><a href="https://github.com/dentearl/mafTools/">mafJoin</a><span>, join them together into a single maf covering the entire simulation.</span></p><p>Address of the bookmark: <a href="https://github.com/dentearl/evolverSimControl" rel="nofollow">https://github.com/dentearl/evolverSimControl</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35384/mgcv-the-microbial-genomic-context-viewer-for-comparative-genome-analysis</guid>
	<pubDate>Mon, 29 Jan 2018 04:55:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35384/mgcv-the-microbial-genomic-context-viewer-for-comparative-genome-analysis</link>
	<title><![CDATA[MGcV: the microbial genomic context viewer for comparative genome analysis]]></title>
	<description><![CDATA[<p><span>MGcV is an interactive web-based visalization tool tailored to facilitate small scale genome analysis. To start using MGcV:</span></p>
<ol>
<li>Supply your genes/genomic segments/phylogenetic tree of interest in the input-box by
<ul>
<li>selecting the type of identifier and pasting identifiers (one per line)</li>
<li><em><strong>or</strong></em>&nbsp;by using the&nbsp;<a>gene ID search tool</a></li>
<li><em><strong>or</strong></em>&nbsp;with the&nbsp;<a>BLAST search tool</a></li>
</ul>
</li>
<li>Click "Visualize context".</li>
</ol>
<p><span>Consult the&nbsp;</span><a href="http://mgcv.cmbi.ru.nl/help.html" target="_blank">documentation</a><span>&nbsp;to learn more about MGcV.</span></p><p>Address of the bookmark: <a href="http://mgcv.cmbi.ru.nl/" rel="nofollow">http://mgcv.cmbi.ru.nl/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</guid>
	<pubDate>Tue, 18 Feb 2020 07:43:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</link>
	<title><![CDATA[Carefully opt for human reference genome]]></title>
	<description><![CDATA[<p><a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" target="_blank">Heng Li posted several issues with the human reference genomes given in these resources</a> and suggests the following compressed FASTA file to be used as hg38/GRCh38 human reference genome.</p>
<p>if you map reads to GRCh38 or hg38, use the following:</p>
<div>
<div>
<pre><code>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
</code></pre>
</div>
</div>
<p>There are several other versions of GRCh37/GRCh38. What&rsquo;s wrong with them? Here are a collection of potential issues:</p>
<p>More at http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</p><p>Address of the bookmark: <a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" rel="nofollow">http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</a></p>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36635/circlator-automated-circularization-of-genome-assemblies-using-long-sequencing-reads</guid>
	<pubDate>Tue, 15 May 2018 09:42:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36635/circlator-automated-circularization-of-genome-assemblies-using-long-sequencing-reads</link>
	<title><![CDATA[Circlator: automated circularization of genome assemblies using long sequencing reads]]></title>
	<description><![CDATA[A tool to circularize genome assemblies. The algorithm and benchmarks are described in the Genome Biology manuscript. 

Citation: "Circlator: automated circularization of genome assemblies using long sequencing reads", Hunt et al, Genome Biology 2015 Dec 29;16(1):294. doi: 10.1186/s13059-015-0849-0. PMID: 26714481.<p>Address of the bookmark: <a href="http://sanger-pathogens.github.io/circlator/" rel="nofollow">http://sanger-pathogens.github.io/circlator/</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>

</channel>
</rss>