<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44168?offset=250</link>
	<atom:link href="https://bioinformaticsonline.com/related/44168?offset=250" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35292/pgap-x-extension-on-pan-genome-analysis-pipeline</guid>
	<pubDate>Tue, 23 Jan 2018 11:41:43 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35292/pgap-x-extension-on-pan-genome-analysis-pipeline</link>
	<title><![CDATA[PGAP-X: Extension on pan-genome analysis pipeline]]></title>
	<description><![CDATA[<p>PGAP-X is a microbial comparative genomic analysis platform with graphic interface. Serials of algorithms and methodologies have been developed and integrated to analyze and visualize genomics structure variation, gene distribution with different conservative levels, and genetic variation from pan-genome sight. At the same time, analytical result data from many other programs, including genome alignment result and orthologs clusters, are also supported to be further analyzed or visualized in PGAP-X. The workflow and feature snapshot in PGAP-X were shown as Fig.1 and Fig.2.</p>
<div><img src="https://pgapx.ybzhao.com/image/f1.jpg" alt="image" style="border: 0px; border: 0px;"></div>
<div>&nbsp;</div>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://pgapx.ybzhao.com/" rel="nofollow">https://pgapx.ybzhao.com/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38762/katuali-is-a-flexible-consensus-pipeline-implemented-in-snakemake-to-basecall-assemble-and-polish-oxford-nanopore-technologies-sequencing-data</guid>
	<pubDate>Tue, 22 Jan 2019 06:26:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38762/katuali-is-a-flexible-consensus-pipeline-implemented-in-snakemake-to-basecall-assemble-and-polish-oxford-nanopore-technologies-sequencing-data</link>
	<title><![CDATA[Katuali is a flexible consensus pipeline implemented in Snakemake to basecall, assemble, and polish Oxford Nanopore Technologies&#039; sequencing data]]></title>
	<description><![CDATA[<ul>
<li>Run a pipeline processing fast5s to a consensus in a single command.</li>
<li>Recommended fixed "standard" and "fast" pipelines.</li>
<li>Interchange basecaller, assembler, and consensus components of the pipelines simply by changing the target filepath.</li>
<li>Seemless distribution of tasks over local or distributed compute.</li>
<li>Highly configurable.</li>
<li>Open source (Mozilla Public License 2.0).</li>
</ul>
<p>Documentation can be found at&nbsp;<a href="https://nanoporetech.github.io/katuali/">https://nanoporetech.github.io/katuali/</a>.</p><p>Address of the bookmark: <a href="https://github.com/nanoporetech/katuali" rel="nofollow">https://github.com/nanoporetech/katuali</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40409/haplotypo-a-variant-calling-pipeline-for-phased-genomes</guid>
	<pubDate>Thu, 19 Dec 2019 07:33:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40409/haplotypo-a-variant-calling-pipeline-for-phased-genomes</link>
	<title><![CDATA[HaploTypo: a variant-calling pipeline for phased genomes]]></title>
	<description><![CDATA[<p>An increasing number of phased (i.e. with resolved haplotypes) reference genomes are available. However, most genetic variant calling tools do not explicitly account for haplotype structure. Here, we present HaploTypo, a pipeline tailored to resolve haplotypes in genetic variation analyses. HaploTypo infers the haplotype correspondence for each heterozygous variant called on a phased reference genome.</p>
<div>Availability and Implementation</div>
<p>HaploTypo is implemented in Python 2.7 and Python 3.5, and is freely available at&nbsp;<a href="https://github.com/gabaldonlab/haplotypo" target="">https://github.com/gabaldonlab/haplotypo</a>, and as a Docker image.</p><p>Address of the bookmark: <a href="https://github.com/gabaldonlab/haplotypo" rel="nofollow">https://github.com/gabaldonlab/haplotypo</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41030/slr-superscaffolder-a-scaffold-assemble-pipeline-for-stlfr-reads</guid>
	<pubDate>Fri, 14 Feb 2020 14:23:30 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41030/slr-superscaffolder-a-scaffold-assemble-pipeline-for-stlfr-reads</link>
	<title><![CDATA[SLR-superscaffolder: A scaffold assemble pipeline for stLFR reads.]]></title>
	<description><![CDATA[<p>This is a scaffold assembler designed for stLFR reads[1]. It uses the link-reads information from stLFR reads to assemble contigs to scaffolds.</p>
<p>Here is an illustration of this pipeline:</p>
<p>&nbsp;<img src="https://github.com/BGI-Qingdao/SLR-superscaffolder/raw/master/image.png" alt="image" style="border: 0px;"></p><p>Address of the bookmark: <a href="https://github.com/BGI-Qingdao/SLR-superscaffolder" rel="nofollow">https://github.com/BGI-Qingdao/SLR-superscaffolder</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42038/pyparanoid-a-pipeline-for-rapid-identification-of-homologous-gene-families-in-a-set-of-genomes</guid>
	<pubDate>Thu, 13 Aug 2020 10:06:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42038/pyparanoid-a-pipeline-for-rapid-identification-of-homologous-gene-families-in-a-set-of-genomes</link>
	<title><![CDATA[PyParanoid: a pipeline for rapid identification of homologous gene families in a set of genomes]]></title>
	<description><![CDATA[<p>PyParanoid is a pipeline for rapid identification of homologous gene families in a set of genomes - a central task of any comparative genomics analysis. The "gold standard" for identifying homologs is to use reciprocal best hits (RBHs) which depends on performing a all-vs-all sequence comparison, usually using BLAST, to determine homology. However, these methods are computationally expensive, requiring&nbsp;O(n2)&nbsp;resources to identify RBHs. This is problematic, as the modern deluge of sequencing data means that comparative genomics analyses could be performed on datasets of thousands of strains.</p><p>Address of the bookmark: <a href="https://github.com/ryanmelnyk/PyParanoid" rel="nofollow">https://github.com/ryanmelnyk/PyParanoid</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42946/aligngraph2-similar-genome-assisted-reassembly-pipeline-for-pacbio-long-reads</guid>
	<pubDate>Sun, 14 Mar 2021 09:42:47 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42946/aligngraph2-similar-genome-assisted-reassembly-pipeline-for-pacbio-long-reads</link>
	<title><![CDATA[AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads]]></title>
	<description><![CDATA[<p><span>AlignGraph2 is the second version of&nbsp;</span><a href="https://github.com/baoe/AlignGraph">AlignGraph</a><span>&nbsp;for PacBio long reads. It extends and refines contigs assembled from the long reads with a published genome similar to the sequencing genome.</span></p>
<p><span>More at&nbsp;https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbab022/6146772</span></p><p>Address of the bookmark: <a href="https://github.com/huangs001/AlignGraph2" rel="nofollow">https://github.com/huangs001/AlignGraph2</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44472/pipesnake-bioinformatics-best-practice-analysis-pipeline-for-phylogenomic-reconstruction</guid>
	<pubDate>Wed, 21 Feb 2024 06:19:41 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44472/pipesnake-bioinformatics-best-practice-analysis-pipeline-for-phylogenomic-reconstruction</link>
	<title><![CDATA[pipesnake: bioinformatics best-practice analysis pipeline for phylogenomic reconstruction]]></title>
	<description><![CDATA[<p dir="auto"><span>ausarg/pipesnake</span>&nbsp;is a bioinformatics best-practice analysis pipeline for phylogenomic reconstruction starting from short-read 'second-generation' sequencing data.</p>
<p dir="auto">The pipeline is built using&nbsp;<a href="https://www.nextflow.io/">Nextflow</a>, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The&nbsp;<a href="https://www.nextflow.io/docs/latest/dsl2.html">Nextflow DSL2</a>&nbsp;implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.</p><p>Address of the bookmark: <a href="https://github.com/AusARG/pipesnake" rel="nofollow">https://github.com/AusARG/pipesnake</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44768/tritex-a-computational-pipeline-for-chromosome-scale-assembly-of-plant-genomes</guid>
	<pubDate>Fri, 14 Feb 2025 10:53:48 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44768/tritex-a-computational-pipeline-for-chromosome-scale-assembly-of-plant-genomes</link>
	<title><![CDATA[TRITEX, a computational pipeline for chromosome-scale assembly of plant genomes]]></title>
	<description><![CDATA[<p><span>This is the documentation of TRITEX, a computational pipeline for chromosome-scale assembly of plant genomes. It was developed in the research group Domestication Genomics at the Leibniz Institute of Plant Genetics and Crop Research (IPK) Gatersleben.</span></p><p>Address of the bookmark: <a href="https://tritexassembly.bitbucket.io/" rel="nofollow">https://tritexassembly.bitbucket.io/</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34482/ribbon-visualizing-complex-genome-alignments-and-structural-variation</guid>
	<pubDate>Wed, 29 Nov 2017 07:40:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34482/ribbon-visualizing-complex-genome-alignments-and-structural-variation</link>
	<title><![CDATA[Ribbon: Visualizing complex genome alignments and structural variation:]]></title>
	<description><![CDATA[<p>Ribbon can be used for long reads, short reads, paired-end reads, and assembly/genome alignments. Instructions for each data format are available by clicking on "instructions" in each tab on the right.</p>
<p>Local installation:</p>
<p>You can install Ribbon locally from Github by following the instructions here:&nbsp;<a href="https://github.com/MariaNattestad/ribbon" target="_blank">https://github.com/MariaNattestad/Ribbon</a></p><p>Address of the bookmark: <a href="http://genomeribbon.com/" rel="nofollow">http://genomeribbon.com/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>