<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/40994?offset=190</link>
	<atom:link href="https://bioinformaticsonline.com/related/40994?offset=190" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/view/982</guid>
	<pubDate>Wed, 17 Jul 2013 15:25:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/view/982</link>
	<title><![CDATA[Is reference genome necessary for gene expression study in transcriptome sequencing or for variant discovery in genome sequencing?]]></title>
	<description><![CDATA[<p><span>Like in case of plant genomes where nature of genome is too complex and huge in size to accomplish complete<em> de novo</em> assembly by current sequencing technology. What would be alternate solution? Can we live in reference free world?</span></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36607/tarean-a-computational-tool-for-identification-and-characterization-of-satellite-dna-from-unassembled-short-reads</guid>
	<pubDate>Tue, 15 May 2018 02:53:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36607/tarean-a-computational-tool-for-identification-and-characterization-of-satellite-dna-from-unassembled-short-reads</link>
	<title><![CDATA[TAREAN: A computational tool for identification and characterization of satellite DNA from unassembled short reads]]></title>
	<description><![CDATA[<p><strong>TA</strong>ndem&nbsp;<strong>RE</strong>peat&nbsp;<strong>AN</strong>alyzer -TAREAN &ndash; is a computational pipeline for&nbsp;<strong>unsupervised identification of satellite repeats</strong>&nbsp;from unassembled sequence reads. The pipeline uses low-pass whole genome sequence reads and performs their graph-based clustering. Resulting clusters, representing all types of repeats, are then examined for the presence of circular structures and putative satellite repeats are reported.</p>
<p><em><strong>How to use TAREAN</strong></em>:</p>
<ul>
<li>Install a local instance of the pipeline using its source code available from&nbsp;<a href="https://bitbucket.org/petrnovak/repex_tarean" target="_blank" title="TAREAN source code">bitbucket repository</a>.</li>
<li>Use&nbsp; public Galaxy-based server at&nbsp;<a href="https://repeatexplorer-elixir.cerit-sc.cz/" target="_blank">https://repeatexplorer-elixir.cerit-sc.cz/</a>. The server is provided in frame of the&nbsp;<a href="https://www.elixir-czech.cz/" target="_blank">Elixir CZ project</a>&nbsp;and is maintained by&nbsp;<a href="https://www.cesnet.cz/" target="_blank">CESNET</a>&nbsp;and&nbsp;<a href="https://www.cerit-sc.cz/en/index.html" target="_blank">CERIT-SC</a>. Simple registration is required to use this service.</li>
</ul>
<p>Development of TAREAN was supported by&nbsp;<a href="https://www.elixir-czech.cz/" target="_blank" title="ELIXIR-CZ">ELIXIR CZ</a>&nbsp;research infrastructure project (MEYS Grant No: LM2015047).</p>
<p><strong><em>References</em></strong></p>
<p>Novak, P., Avila Robledillo, L., Koblizkova, A., Vrbova, I., Neumann, P., Macas, J. (2017) &ndash;&nbsp;<a href="https://academic.oup.com/nar/article/3574061/" target="_blank">TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads</a>.&nbsp;<em>Nucleic Acids Res.</em>, doi:10.1093/nar/gkx257</p><p>Address of the bookmark: <a href="https://bitbucket.org/petrnovak/repex_tarean" rel="nofollow">https://bitbucket.org/petrnovak/repex_tarean</a></p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36918/p-rna-scaffolder-a-fast-and-accurate-genome-scaffolder-using-paired-end-rna-sequencing-reads</guid>
	<pubDate>Tue, 12 Jun 2018 08:14:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36918/p-rna-scaffolder-a-fast-and-accurate-genome-scaffolder-using-paired-end-rna-sequencing-reads</link>
	<title><![CDATA[P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads]]></title>
	<description><![CDATA[P_RNA_scaffolder, a fast and accurate tool using paired-end RNA-sequencing reads to scaffold genomes. This tool aims to improve the completeness of both protein-coding and non-coding genes. After this tool was applied to scaffolding human contigs, the structures of both protein-coding genes and circular RNAs were almost completely recovered and equivalent to those in a complete genome, especially for long proteins and long circular RNAs.<p>Address of the bookmark: <a href="http://www.fishbrowser.org/software/P_RNA_scaffolder/" rel="nofollow">http://www.fishbrowser.org/software/P_RNA_scaffolder/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40465/airlift-a-methodology-and-tool-for-comprehensively-moving-mappings-and-annotations-from-one-genome-to-another-similar-genome</guid>
	<pubDate>Mon, 23 Dec 2019 10:20:13 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40465/airlift-a-methodology-and-tool-for-comprehensively-moving-mappings-and-annotations-from-one-genome-to-another-similar-genome</link>
	<title><![CDATA[AirLift, a methodology and tool for comprehensively moving mappings and annotations from one genome to another similar genome]]></title>
	<description><![CDATA[<p>We propose AirLift, a methodology and tool for comprehensively moving mappings and annotations from one genome to another similar genome while maintaining the accuracy of a full mapper.</p><p>Address of the bookmark: <a href="https://github.com/CMU-SAFARI/AirLift" rel="nofollow">https://github.com/CMU-SAFARI/AirLift</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43711/vcf-compare</guid>
	<pubDate>Wed, 19 Jan 2022 10:30:14 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43711/vcf-compare</link>
	<title><![CDATA[VCF Compare !]]></title>
	<description><![CDATA[<h2><span>compare two&nbsp;<strong>BWA</strong>&nbsp;mapping methods with the online hg18-mapped data</span></h2>
<p>We first operate a rapid inspection of the different BAM files using&nbsp;<strong>samtools flagstat</strong>. Illumina provided chr21 read mapping obtained with their&nbsp;<strong>GA IIx</strong>&nbsp;deep sequencing platform &lt;<a href="ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NA18507_GAIIx_100_chr21.bam" target="_blank">ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NA18507_GAIIx_100_chr21.bam</a>&gt;, aligned to the b36/hg18 reference genome)</p><p>Address of the bookmark: <a href="https://wiki.bits.vib.be/index.php/NGS_Exercise.6#compare_aln_.26_mem_results_with_vcf-compare" rel="nofollow">https://wiki.bits.vib.be/index.php/NGS_Exercise.6#compare_aln_.26_mem_results_with_vcf-compare</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/8504/update-genome-workbench-2715-released</guid>
	<pubDate>Wed, 26 Feb 2014 16:12:17 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/8504/update-genome-workbench-2715-released</link>
	<title><![CDATA[Update Genome Workbench 2.7.15 released]]></title>
	<description><![CDATA[<p>NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.</p><p><img src="http://www.ncbi.nlm.nih.gov/core/assets/gbench/images/firstscreen_still.gif" alt="Introductory screen shot" style="border: 0px; border: 0px;"></p><p>Genome Workbench can display sequence data in many ways, including graphical sequence views, various alignment views, phylogenetic tree views, and tabular views of data. It can also align your private data to data in public databases, display your data in the context of public data, and retrieve BLAST results.</p><p>Genome Workbench is built on the NCBI C++ ToolKit and uses cross-platform APIs for graphics. It runs on your local machine, and is available for Windows 2000/XP, Linux, MacOS X, and various flavors of Unix.</p><p>NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. Genome Workbench was developed entirely in-house at NCBI and makes use of the NCBI C++ ToolKit. The C++ ToolKit provides a convenient and flexible cross-platform API for managing system internals, database connections, network sockets, and the NCBI data model. In addition, the C++ ToolKit provides the Object Manager, which abstracts handling of sequences and sequence-related objects.</p><p>&nbsp;New Features in Genome Workbench 2.7.15 <br /><br /></p><ul>
<li>Multiple Alignment View: implemented adaptive feature display when zooming in</li>
<li>Active Objects Inspector replaces Selection Inspector. New View should offer an improved selection context examination. See Using Active Objects Inspector tutorial for more details.</li>
<li>Binary packages for Linux OpenSUSE 13.1 are now available</li>
</ul><p><br />Bug Fixes and Improvements in Genome Workbench 2.7.15 <br /><br /></p><ul>
<li>Fixed major issue with OpenGL overlay/scrolling. Could cause crashes or view scrolling irregularities</li>
<li>Multiple Pane View: fixed crash on loading BLAST results</li>
<li>Graphical Sequence View: fixed crash on zooming in and out, related to SNP track</li>
<li>Graphical Sequence View: fixed Go To Position dialog to give better diagnostics in case of a user error</li>
<li>Graphical Sequence View: PDF export fixed rendering of Markers with commas in the name</li>
<li>Text View / Flat File: fixed Mac OS rendering issues</li>
<li>Text View / Flat File: performance optimization, extended capabilities of real-time rendering of molecules to tens of thousands</li>
<li>File Import: optimization improvement to speed up load of files containing multiple project items</li>
<li>File Import: remapping stage now shows accession.version and description of molecules, instead of plain GI numbers</li>
<li>Mac OS: improved tooltips for toolbar buttons</li>
<li>Phylogenetic Tree Builder Tool: improved diagnostics of errors</li>
<li>Multiple Alignment View: optimizations to avoid main GUI freezes</li>
<li>Open Dialog: removed duplicate elements in table of genomes (load Genome)</li>
<li>PDF export: fixed issue with XREF table errors</li>
<li>Tree View: fixed issues with showing Force Layout progress on Mac OS</li>
<li>Tree View: PDF export fixed issues for showing labels of collapsed nodes</li>
<li>Tree View: added an option to stop layout</li>
<li>Tree View: broadcasting mechanism fixed not to accumulate selected nodes</li>
</ul><p>Reference:</p><p>NCBI news</p><p>http://www.ncbi.nlm.nih.gov/tools/gbench/</p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31568/pacbio-long-reads-compatible-software-and-tools</guid>
	<pubDate>Wed, 15 Mar 2017 14:19:01 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31568/pacbio-long-reads-compatible-software-and-tools</link>
	<title><![CDATA[Pacbio Long Reads Compatible Software and Tools]]></title>
	<description><![CDATA[<p>The following software packages are known to be compatible with PacBio&reg; data, in addition to PacBio's own SMRT&reg; Analysis suite. All packages are believed to be open source or freely available for non-commercial use. See the individual project sites for up-to-date license information. A separate page lists&nbsp;<a href="http://pacb.com/community/partner_program/current_partners/">commercial software</a>.</p>
<p>Know of any other open source software for PacBio data?&nbsp;<a href="mailto:devnet@pacificbiosciences.com">Email us</a>.</p>
<p>Software categories:</p>
<ul>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#denovo">De novo assembly</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#svdetection">Structural Variations Detection</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#aligners">Reference-based alignment</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#variants">Consensus and variant calling</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#RNA">RNA analysis</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#basemods">Epigenetic base modifications and methylation</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#barcoding">Barcoding</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#browsers">Genome Browsers</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#qc">Run QC</a></li>
<li><a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software#frameworks">Frameworks and APIs</a></li>
</ul><p>Address of the bookmark: <a href="https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software" rel="nofollow">https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software</a></p>]]></description>
	<dc:creator>Archana Malhotra</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/35125/eugene-v-koonin-lab</guid>
  <pubDate>Tue, 09 Jan 2018 05:01:15 -0600</pubDate>
  <link></link>
  <title><![CDATA[Eugene V. Koonin Lab]]></title>
  <description><![CDATA[
<p>Interested in understanding the evolution of life. To obtain glimpses of such understanding, we employ existing and new methods of computational biology to perform research in several major areas.</p>

<p>https://www.ncbi.nlm.nih.gov/research/groups/koonin/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43690/ucsc-sars-cov-2-genome-browser</guid>
	<pubDate>Thu, 06 Jan 2022 06:48:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43690/ucsc-sars-cov-2-genome-browser</link>
	<title><![CDATA[UCSC SARS-CoV-2 Genome Browser]]></title>
	<description><![CDATA[<p><span>The UCSC SARS-CoV-2 Genome Browser (</span><a href="https://genome.ucsc.edu/covid19.html">https://genome.ucsc.edu/covid19.html</a><span>) is an adaptation of our popular genome-browser visualization tool for this virus, containing many annotation tracks and new features, including conservation with similar viruses, immune epitopes, RT&ndash;PCR and sequencing primers and CRISPR guides. We invite all investigators to contribute to this resource to accelerate research and development activities globally.</span></p><p>Address of the bookmark: <a href="https://www.nature.com/articles/s41588-020-0700-8" rel="nofollow">https://www.nature.com/articles/s41588-020-0700-8</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>