<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36257?offset=350</link>
	<atom:link href="https://bioinformaticsonline.com/related/36257?offset=350" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34569/ksnp30-snp-detection-and-phylogenetic-analysis-of-genomes-without-genome-alignment-or-reference-genome</guid>
	<pubDate>Fri, 08 Dec 2017 16:48:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34569/ksnp30-snp-detection-and-phylogenetic-analysis-of-genomes-without-genome-alignment-or-reference-genome</link>
	<title><![CDATA[kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome]]></title>
	<description><![CDATA[<p><span>Sept. 20, 2017 Version 3.1 released. Major upgrade. Version 3.1 fixes the problems with SNP annotation that arose when NCBI discontinued use of GI numbers. Please read carefully the Preface (page 3) and the File of annotated genomes section (pages 9-10) in the version 3.1 User Guide. Thanks to Tom Slezak for revsing the get_genbank_file3 script and to Tod Stuber (USDA) for testing version 3.1 even though he doesn't need the annotation feature. All users are encouraged to upgrade to version 3.1.&nbsp;<br></span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/ksnp/files/" rel="nofollow">https://sourceforge.net/projects/ksnp/files/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34715/delta-a-new-web-based-3d-genome-visualization-and-analysis-platform</guid>
	<pubDate>Wed, 20 Dec 2017 08:49:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34715/delta-a-new-web-based-3d-genome-visualization-and-analysis-platform</link>
	<title><![CDATA[Delta: a new Web-based 3D genome visualization and analysis platform]]></title>
	<description><![CDATA[<p><em>Delta</em><span>&nbsp;is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes.&nbsp;</span><em>Delta</em><span>&nbsp;takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome.&nbsp;</span><em>Delta</em><span>features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs.</span></p>
<p>https://github.com/zhangzhwlab/delta</p><p>Address of the bookmark: <a href="https://github.com/zhangzhwlab/delta" rel="nofollow">https://github.com/zhangzhwlab/delta</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35384/mgcv-the-microbial-genomic-context-viewer-for-comparative-genome-analysis</guid>
	<pubDate>Mon, 29 Jan 2018 04:55:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35384/mgcv-the-microbial-genomic-context-viewer-for-comparative-genome-analysis</link>
	<title><![CDATA[MGcV: the microbial genomic context viewer for comparative genome analysis]]></title>
	<description><![CDATA[<p><span>MGcV is an interactive web-based visalization tool tailored to facilitate small scale genome analysis. To start using MGcV:</span></p>
<ol>
<li>Supply your genes/genomic segments/phylogenetic tree of interest in the input-box by
<ul>
<li>selecting the type of identifier and pasting identifiers (one per line)</li>
<li><em><strong>or</strong></em>&nbsp;by using the&nbsp;<a>gene ID search tool</a></li>
<li><em><strong>or</strong></em>&nbsp;with the&nbsp;<a>BLAST search tool</a></li>
</ul>
</li>
<li>Click "Visualize context".</li>
</ol>
<p><span>Consult the&nbsp;</span><a href="http://mgcv.cmbi.ru.nl/help.html" target="_blank">documentation</a><span>&nbsp;to learn more about MGcV.</span></p><p>Address of the bookmark: <a href="http://mgcv.cmbi.ru.nl/" rel="nofollow">http://mgcv.cmbi.ru.nl/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</guid>
	<pubDate>Tue, 18 Feb 2020 07:43:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41158/carefully-opt-for-human-reference-genome</link>
	<title><![CDATA[Carefully opt for human reference genome]]></title>
	<description><![CDATA[<p><a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" target="_blank">Heng Li posted several issues with the human reference genomes given in these resources</a> and suggests the following compressed FASTA file to be used as hg38/GRCh38 human reference genome.</p>
<p>if you map reads to GRCh38 or hg38, use the following:</p>
<div>
<div>
<pre><code>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
</code></pre>
</div>
</div>
<p>There are several other versions of GRCh37/GRCh38. What&rsquo;s wrong with them? Here are a collection of potential issues:</p>
<p>More at http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</p><p>Address of the bookmark: <a href="http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use" rel="nofollow">http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use</a></p>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36830/crossmap-a-program-for-convenient-conversion-of-genome-coordinates</guid>
	<pubDate>Thu, 31 May 2018 06:00:47 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36830/crossmap-a-program-for-convenient-conversion-of-genome-coordinates</link>
	<title><![CDATA[CrossMap: a program for convenient conversion of genome coordinates]]></title>
	<description><![CDATA[CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) &lt;&gt; hg19 (GRCh37), Mouse mm9 (MGSCv37) &lt;&gt; mm10 (GRCm38)).

It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.

CrossMap is designed to liftover genome coordinates between assemblies. 

It’s not a program for aligning sequences to reference genome.

We do not recommend using CrossMap to convert genome coordinates between species.<p>Address of the bookmark: <a href="http://crossmap.sourceforge.net" rel="nofollow">http://crossmap.sourceforge.net</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38166/pygenometracks-standalone-program-and-library-to-plot-beautiful-genome-browser-tracks</guid>
	<pubDate>Fri, 09 Nov 2018 12:34:23 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38166/pygenometracks-standalone-program-and-library-to-plot-beautiful-genome-browser-tracks</link>
	<title><![CDATA[pyGenomeTracks: Standalone program and library to plot beautiful genome browser tracks]]></title>
	<description><![CDATA[<p>pyGenomeTracks aims to produce high-quality genome browser tracks that are highly customizable. Currently, it is possible to plot:</p>
<ul>
<li>bigwig</li>
<li>bed (many options)</li>
<li>bedgraph</li>
<li>links (represented as arcs)</li>
<li>Hi-C matrices (if&nbsp;<a href="http://hicexplorer.readthedocs.io/">HiCExplorer</a>&nbsp;is installed)</li>
</ul><p>Address of the bookmark: <a href="https://github.com/deeptools/pyGenomeTracks" rel="nofollow">https://github.com/deeptools/pyGenomeTracks</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38443/genoplotr-plot-gene-and-genome-maps-project</guid>
	<pubDate>Wed, 12 Dec 2018 08:33:41 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38443/genoplotr-plot-gene-and-genome-maps-project</link>
	<title><![CDATA[genoPlotR - plot gene and genome maps project!]]></title>
	<description><![CDATA[<p>genoPlotR is a R package to produce reproducible, publication-grade graphics of gene and genome maps. It allows the user to read from usual format such as protein table files and blast results, as well as home-made tabular files.</p>
<h3>Features</h3>
<ul>
<li>Linear representation of several segments of DNA</li>
<li>Comparisons represented by areas between the segments (like Artemis, for example)</li>
<li>Reads from common formats: Genbank, EMBL, blast, Mauve, and from user-generated tab files</li>
<li>Plot several subsegments of the same segment on the same line, separated by a //</li>
<li>Automatic or manual placement of the segments on the plot</li>
<li>Add annotations to all the lines</li>
<li>Create smart, automatic annotations for genomes, based on gene names</li>
<li>Add a user-generated tree</li>
<li>Add a global scale or a scale to each line</li>
<li>Use user-defined graphical functions to represent genes</li>
<li></li>
</ul><p>Address of the bookmark: <a href="http://genoplotr.r-forge.r-project.org/" rel="nofollow">http://genoplotr.r-forge.r-project.org/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38579/genomeview-genome-browser-and-annotation-editor</guid>
	<pubDate>Wed, 02 Jan 2019 04:09:06 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38579/genomeview-genome-browser-and-annotation-editor</link>
	<title><![CDATA[GenomeView: genome browser and annotation editor]]></title>
	<description><![CDATA[<p><span>GenomeView is a genome browser and annotation editor that displays reference sequence, annotation, multiple alignments, short read alignments and graphs. Most major data formats are supported. Local and internet files can be loaded.</span><br><span>This project has moved to GitHub:&nbsp;</span><a href="https://github.com/GenomeView/genomeview" target="_blank">https://github.com/GenomeView/genomeview</a></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/genomeview/" rel="nofollow">https://sourceforge.net/projects/genomeview/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/39704/the-rogers-lab</guid>
  <pubDate>Mon, 15 Jul 2019 08:07:44 -0500</pubDate>
  <link></link>
  <title><![CDATA[The Rogers Lab]]></title>
  <description><![CDATA[
<p>The Rogers lab studies evolution of genome structure. We explore the ways that complex mutations like duplications, deletions, rearrangements, and retrogenes can create new genetic material. We study how these new mutations are important for adaptation. We are currently working on projects in Drosophila, Mammoths, Elephants, Bivalves, and Frogs absolutely no amphibians. This multi-organism approach can help us understand when and why complex mutations are important for organism fitness.</p>

<p>More at http://evolscientist.com/</p>
]]></description>
</item>

</channel>
</rss>