<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44483?offset=550</link>
	<atom:link href="https://bioinformaticsonline.com/related/44483?offset=550" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44896/jaeger-an-accurate-and-fast-deep-learning-tool-to-detect-bacteriophage-sequences</guid>
	<pubDate>Sun, 31 Aug 2025 06:30:16 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44896/jaeger-an-accurate-and-fast-deep-learning-tool-to-detect-bacteriophage-sequences</link>
	<title><![CDATA[Jaeger : an accurate and fast deep-learning tool to detect bacteriophage sequences]]></title>
	<description><![CDATA[<p><span>Jaeger is a tool that utilizes homology-free machine learning to identify phage genome sequences that are hidden within metagenomes. It is capable of detecting both phages and prophages within metagenomic assemblies.</span></p><p>Address of the bookmark: <a href="https://github.com/MGXlab/Jaeger" rel="nofollow">https://github.com/MGXlab/Jaeger</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40351/repeatmodeler2-automated-genomic-discovery-of-transposable-element-families</guid>
	<pubDate>Mon, 02 Dec 2019 06:52:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40351/repeatmodeler2-automated-genomic-discovery-of-transposable-element-families</link>
	<title><![CDATA[RepeatModeler2: automated genomic discovery of transposable element families]]></title>
	<description><![CDATA[<p><span>RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license (</span><a href="https://github.com/Dfam-consortium/RepeatModeler">https://github.com/Dfam-consortium/RepeatModeler</a><span>,&nbsp;</span><a href="https://github.com/Dfam-consortium/TETools">https://github.com/Dfam-consortium/TETools</a><span>).</span></p><p>Address of the bookmark: <a href="https://github.com/Dfam-consortium/TETools" rel="nofollow">https://github.com/Dfam-consortium/TETools</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36017/alpha-a-toolkit-for-automated-local-phylogenomic-analyses</guid>
	<pubDate>Wed, 21 Mar 2018 18:12:06 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36017/alpha-a-toolkit-for-automated-local-phylogenomic-analyses</link>
	<title><![CDATA[ALPHA: A Toolkit for Automated Local Phylogenomic Analyses]]></title>
	<description><![CDATA[<p><span>Automated Local Phylogenomic Analyses, or ALPHA, is a python-based application that provides an intuitive user interface for phylogenetic analyses and data visualization. It has four distinct modes that are useful for different types of phylogenetic analysis: RAxML, File Converter, MS Comparison, and D-statistic.</span></p><p>Address of the bookmark: <a href="https://github.com/chilleo/ALPHA" rel="nofollow">https://github.com/chilleo/ALPHA</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40862/alien-index-identify-potential-contaminants-or-horizontally-transferred-genes-in-transcriptomes</guid>
	<pubDate>Sun, 02 Feb 2020 13:51:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40862/alien-index-identify-potential-contaminants-or-horizontally-transferred-genes-in-transcriptomes</link>
	<title><![CDATA[alien_index : identify potential contaminants or horizontally transferred genes in transcriptomes.]]></title>
	<description><![CDATA[<p><span>identify potential contaminants or horizontally transferred genes in transcriptomes.</span></p>
<p><span><span>algorithm is based on alogorithm described in the following: Gladyshev, Eugene A., Matthew Meselson, and Irina R. Arkhipova. "Massive horizontal gene transfer in bdelloid rotifers." science 320.5880 (2008): 1210-1213.</span></span></p>
<p><a href="https://github.com/josephryan/alien_index">alien_index</a></p><p>Address of the bookmark: <a href="https://github.com/josephryan/alien_index" rel="nofollow">https://github.com/josephryan/alien_index</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43011/deg-50-a-database-of-essential-genes-in-both-prokaryotes-and-eukaryotes</guid>
	<pubDate>Tue, 30 Mar 2021 11:47:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43011/deg-50-a-database-of-essential-genes-in-both-prokaryotes-and-eukaryotes</link>
	<title><![CDATA[DEG 5.0: a database of essential genes in both prokaryotes and eukaryotes]]></title>
	<description><![CDATA[<p><span>Essential genes are those indispensable for the survival of an organism, and their functions are therefore considered a foundation of life. Determination of a minimal gene set needed to sustain a life form, a fundamental question in biology, plays a key role in the emerging field, synthetic biology. </span></p>
<p><span></span><span>DEG is freely available at the website&nbsp;</span><a href="http://tubic.tju.edu.cn/deg" target="_blank">http://tubic.tju.edu.cn/deg</a><span>&nbsp;or&nbsp;</span><a href="http://www.essentialgene.org/" target="_blank">http://www.essentialgene.org</a><span>.</span></p><p>Address of the bookmark: <a href="http://www.essentialgene.org/" rel="nofollow">http://www.essentialgene.org/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44468/orthoflow-workflow-for-phylogenetic-inference-of-genome-scale-datasets-of-protein-coding-genes</guid>
	<pubDate>Wed, 21 Feb 2024 06:13:08 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44468/orthoflow-workflow-for-phylogenetic-inference-of-genome-scale-datasets-of-protein-coding-genes</link>
	<title><![CDATA[Orthoflow: workflow for phylogenetic inference of genome-scale datasets of protein-coding genes]]></title>
	<description><![CDATA[<p><span>Orthoflow is a workflow for phylogenetic inference of genome-scale datasets of protein-coding genes. Our goal was to make it straightforward to work from a combination of input sources including annotated contigs in Genbank format and FASTA files containing CDSs. It uses several state of the art inference methods for orthology inference, either based on HMM profiles or de novo inference of orthogroups. Through the use of OrthoSNAP, many additional ortholog alignments can be generated from multi-copy gene families. For phylogenetic inference, users can choose a supermatrix approach and/or gene tree inference followed by supertree reconstruction. Users can specify a range of alignment filtering settings to retain high-quality alignments for phylogenetic inference. The workflow produces a detailed report that, in addition to the phylogenetic results, includes a range of diagnostics to verify the quality of the results.</span></p><p>Address of the bookmark: <a href="https://github.com/rbturnbull/orthoflow" rel="nofollow">https://github.com/rbturnbull/orthoflow</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/4408/fourth-branch-of-life</guid>
	<pubDate>Mon, 09 Sep 2013 21:48:37 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/4408/fourth-branch-of-life</link>
	<title><![CDATA[Fourth Branch of Life]]></title>
	<description><![CDATA[<p>Scientist have found the biggest viruses known, pandoraviruses which opened up entirely /completely... new questions questions and raise objections to in science. It even suggesting a fourth domain of life.</p><p>The new visrus are about one micron&mdash;a thousandth of a millimeter&mdash;in length, the newfound genus Pandoravirus dwarfs other viruses, which range in size from about 50 nanometers up to 100 nanometers. A genus is a taxonomic ranking between species and family.</p><p>Find&nbsp; more at @ http://www.nature.com/scitable/blog/viruses101/newly_found_pandoraviruses_hint_at</p><p>http://news.nationalgeographic.co.uk/news/2013/07/130718-viruses-pandoraviruses-science-biology-evolution/</p><p>&nbsp;</p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27850/clusterprofiler</guid>
	<pubDate>Thu, 16 Jun 2016 18:57:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27850/clusterprofiler</link>
	<title><![CDATA[clusterProfiler]]></title>
	<description><![CDATA[<p>statistical analysis and visulization of functional profiles for genes and gene clusters<br><br>Bioconductor version: Release (3.3)<br><br>This package implements methods to analyze and visualize functional profiles (GO and KEGG) of gene and gene clusters.<br><br>Author: Guangchuang Yu &lt;guangchuangyu at gmail.com&gt; with contributions from Li-Gen Wang and Giovanni Dall'Olio.<br><br>Maintainer: Guangchuang Yu &lt;guangchuangyu at gmail.com&gt;<br><br>Citation (from within R, enter citation("clusterProfiler")):<br><br>Yu G, Wang L, Han Y and He Q (2012). &ldquo;clusterProfiler: an R package for comparing biological themes among gene clusters.&rdquo; OMICS: A Journal of Integrative Biology, 16(5), pp. 284-287.<br>Installation<br><br>To install this package, start R and enter:<br><br>## try http:// if https:// URLs are not supported<br>source("https://bioconductor.org/biocLite.R")<br>biocLite("clusterProfiler")</p>
<p>https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html</p><p>Address of the bookmark: <a href="https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html" rel="nofollow">https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40505/decostar-reconstructing-the-ancestral-organization-of-genes-or-genomes-using-reconciled-phylogenies</guid>
	<pubDate>Fri, 03 Jan 2020 13:28:19 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40505/decostar-reconstructing-the-ancestral-organization-of-genes-or-genomes-using-reconciled-phylogenies</link>
	<title><![CDATA[DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies]]></title>
	<description><![CDATA[<p>DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann&ndash;Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life.</p><p>Address of the bookmark: <a href="https://github.com/YoannAnselmetti/DeCoSTAR_pipeline" rel="nofollow">https://github.com/YoannAnselmetti/DeCoSTAR_pipeline</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>