<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/4590?offset=240</link>
	<atom:link href="https://bioinformaticsonline.com/related/4590?offset=240" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36849/glean-an-unsupervised-learning-system-to-integrate-disparate-sources-of-gene-structure-evidence</guid>
	<pubDate>Sat, 02 Jun 2018 07:38:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36849/glean-an-unsupervised-learning-system-to-integrate-disparate-sources-of-gene-structure-evidence</link>
	<title><![CDATA[GLEAN: an unsupervised learning system to integrate disparate sources of gene structure evidence]]></title>
	<description><![CDATA[<p><span>GLEAN is an unsupervised learning system to integrate disparate sources of gene structure evidence (gene model predictions, EST/protein genomic sequence alignments, SAGE/peptide tags, etc) to produce a consensus gene prediction, without prior training.</span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/glean-gene/" rel="nofollow">https://sourceforge.net/projects/glean-gene/</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38541/geneoverlap-an-r-package-to-test-and-visualize-gene-overlaps</guid>
	<pubDate>Thu, 27 Dec 2018 19:45:52 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38541/geneoverlap-an-r-package-to-test-and-visualize-gene-overlaps</link>
	<title><![CDATA[GeneOverlap: An R package to test and visualize gene overlaps]]></title>
	<description><![CDATA[<p>Overlapping gene lists can reveal biological meanings and may lead to novel hypotheses. For example, histone modification is an important cellular mechanism that can pack and re-pack chromatin. By making the chromatin structure more dense or loose, the gene expression can be turned on or off. Tri-methylation on lysine 4 of histone H3 (H3K4me3) is associated with gene activation and its genome-wide enrichment can be mapped by using ChIP-seq experiments. Because of its activating role, if we overlap the genes that are bound by H3K4me3 with the genes that are highly expressed, we should expect a positive association. Similary, we can perform such kind of overlapping between the gene lists of different histone modifications with that of various expression groups and establish each histone modification&rsquo;s role in gene regulation.</p><p>Address of the bookmark: <a href="https://bioconductor.org/packages/release/bioc/vignettes/GeneOverlap/inst/doc/GeneOverlap.pdf" rel="nofollow">https://bioconductor.org/packages/release/bioc/vignettes/GeneOverlap/inst/doc/GeneOverlap.pdf</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41969/shadowcaster-a-hybrid-approach-for-the-detection-of-horizontal-gene-transfer-events-in-prokaryotes</guid>
	<pubDate>Tue, 14 Jul 2020 06:42:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41969/shadowcaster-a-hybrid-approach-for-the-detection-of-horizontal-gene-transfer-events-in-prokaryotes</link>
	<title><![CDATA[ShadowCaster: a hybrid approach for the detection of horizontal gene transfer events in prokaryotes]]></title>
	<description><![CDATA[<p><span>ShadowCaster implements an evolutionary model to calculate Bayesian likelihoods for each &lsquo;alien genes&rsquo; with an unusual sequence composition according to the host genome background to detect HGT events in prokaryotes.</span></p>
<p><a href="https://www.mdpi.com/2073-4425/11/7/756/htm">https://www.mdpi.com/2073-4425/11/7/756/htm</a></p>
<p><a href="https://shadowcaster.readthedocs.io/en/latest/">https://shadowcaster.readthedocs.io/en/latest/</a></p>
<p><a href="https://github.com/dani2s/ShadowCaster_testData">https://github.com/dani2s/ShadowCaster_testData</a></p><p>Address of the bookmark: <a href="https://github.com/dani2s/ShadowCaster" rel="nofollow">https://github.com/dani2s/ShadowCaster</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34482/ribbon-visualizing-complex-genome-alignments-and-structural-variation</guid>
	<pubDate>Wed, 29 Nov 2017 07:40:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34482/ribbon-visualizing-complex-genome-alignments-and-structural-variation</link>
	<title><![CDATA[Ribbon: Visualizing complex genome alignments and structural variation:]]></title>
	<description><![CDATA[<p>Ribbon can be used for long reads, short reads, paired-end reads, and assembly/genome alignments. Instructions for each data format are available by clicking on "instructions" in each tab on the right.</p>
<p>Local installation:</p>
<p>You can install Ribbon locally from Github by following the instructions here:&nbsp;<a href="https://github.com/MariaNattestad/ribbon" target="_blank">https://github.com/MariaNattestad/Ribbon</a></p><p>Address of the bookmark: <a href="http://genomeribbon.com/" rel="nofollow">http://genomeribbon.com/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</guid>
	<pubDate>Fri, 08 Dec 2017 16:26:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</link>
	<title><![CDATA[jobTree based python wrapper to run the genome simulation tool suite Evolver]]></title>
	<description><![CDATA[<p><span>evolverSimControl</span><span>&nbsp;(</span><span>eSC</span><span>) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (</span><a href="http://evolution.genetics.washington.edu/phylip/newicktree.html">Newick format</a><span>). In addition to simply running evolver,&nbsp;</span><span>eSC</span><span>&nbsp;also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (</span><a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format5">.maf</a><span>) from leaf and branch nodes from a completed simulation and with the help of&nbsp;</span><a href="https://github.com/dentearl/mafTools/">mafJoin</a><span>, join them together into a single maf covering the entire simulation.</span></p><p>Address of the bookmark: <a href="https://github.com/dentearl/evolverSimControl" rel="nofollow">https://github.com/dentearl/evolverSimControl</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34620/mash-fast-genome-and-metagenome-distance-estimation-using-minhash</guid>
	<pubDate>Tue, 12 Dec 2017 17:30:12 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34620/mash-fast-genome-and-metagenome-distance-estimation-using-minhash</link>
	<title><![CDATA[Mash: fast genome and metagenome distance estimation using MinHash]]></title>
	<description><![CDATA[<p>Mash is normally distributed as a dependency-free binary for Linux or OSX (see&nbsp;<a href="https://github.com/marbl/Mash/releases">https://github.com/marbl/Mash/releases</a>). This source distribution is intended for other operating systems or for development. Mash requires c++11 to build, which is available in and GCC &gt;= 4.8 and OSX &gt;= 10.7.</p>
<p>See&nbsp;<a href="http://mash.readthedocs.org/">http://mash.readthedocs.org</a>&nbsp;for more information.</p><p>Address of the bookmark: <a href="https://github.com/marbl/Mash/releases" rel="nofollow">https://github.com/marbl/Mash/releases</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35135/alitv%E2%80%94interactive-visualization-of-whole-genome-comparisons</guid>
	<pubDate>Wed, 10 Jan 2018 07:08:17 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35135/alitv%E2%80%94interactive-visualization-of-whole-genome-comparisons</link>
	<title><![CDATA[AliTV—interactive visualization of whole genome comparisons]]></title>
	<description><![CDATA[<p>AliTV, which provides interactive visualization of whole genome alignments. AliTV reads multiple whole genome alignments or automatically generates alignments from the provided data. Optional feature annotations and phylo- genetic information are supported. The user-friendly, web-browser based and highly customizable interface allows rapid exploration and manipulation of the visualized data as well as the export of publication-ready high-quality figures. AliTV is freely available at&nbsp;<a href="https://github.com/AliTVTeam/AliTV">https://github.com/AliTVTeam/AliTV</a></p>
<p>https://alitvteam.github.io/AliTV/</p><p>Address of the bookmark: <a href="https://github.com/AliTVTeam/AliTV" rel="nofollow">https://github.com/AliTVTeam/AliTV</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35543/genometools-the-versatile-open-source-genome-analysis-software</guid>
	<pubDate>Wed, 07 Feb 2018 10:44:18 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35543/genometools-the-versatile-open-source-genome-analysis-software</link>
	<title><![CDATA[GenomeTools: The versatile open source genome analysis software]]></title>
	<description><![CDATA[<p>The&nbsp;<em>GenomeTools</em>&nbsp;genome analysis system is a&nbsp;<a href="http://genometools.org/license.html">free</a>&nbsp;collection of bioinformatics&nbsp;<a href="http://genometools.org/tools.html">tools</a>&nbsp;(in the realm of genome informatics) combined into a single binary named&nbsp;<em>gt</em>. It is based on a C library named &ldquo;libgenometools&rdquo; which consists of several modules.</p>
<p>If you are interested in gene prediction, have a look at&nbsp;<a href="http://genomethreader.org/" title="GenomeThreader gene prediction        software"><em>GenomeThreader</em></a>.</p><p>Address of the bookmark: <a href="http://genometools.org/" rel="nofollow">http://genometools.org/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36257/aligngraph-algorithm-for-secondary-de-novo-genome-assembly-guided-by-closely-related-references</guid>
	<pubDate>Tue, 17 Apr 2018 16:21:20 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36257/aligngraph-algorithm-for-secondary-de-novo-genome-assembly-guided-by-closely-related-references</link>
	<title><![CDATA[AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references]]></title>
	<description><![CDATA[<p>AlignGraph is a software that extends and joins contigs or scaffolds by reassembling them with help provided by a reference genome of a closely related organism.</p>
<p>Using AlignGraph</p>
<pre><code>AlignGraph --read1 reads_1.fa --read2 reads_2.fa --contig contigs.fa --genome genome.fa --distanceLow distanceLow --distanceHigh distancehigh --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa [--kMer k --insertVariation insertVariation --coverage coverage --part p --fastMap --ratioCheck --iterativeMap --misassemblyRemoval --resume]</code></pre>
<h3>&nbsp;</h3><p>Address of the bookmark: <a href="https://github.com/baoe/AlignGraph" rel="nofollow">https://github.com/baoe/AlignGraph</a></p>]]></description>
	<dc:creator>Manisha Mishra</dc:creator>
</item>

</channel>
</rss>