<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/41330?offset=290</link>
	<atom:link href="https://bioinformaticsonline.com/related/41330?offset=290" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</guid>
	<pubDate>Thu, 31 Aug 2023 02:43:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</link>
	<title><![CDATA[Steps to find all the repeats in the genome !]]></title>
	<description><![CDATA[<div><p>To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. Here's a step-by-step guide:</p></div><div><ol>
<li>Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
</ol></div><div><ol>
<li>Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".</li>
</ol><blockquote><p>./RepeatMasker -pa &lt;number_of_processors&gt; -nolow -norna -no_is -div &lt;divergence_value&gt; -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species &lt;species_name&gt; -dir &lt;output_directory&gt; -length &lt;min_length&gt;-&lt;max_length&gt; genome.fasta</p></blockquote><div><p>Replace the following placeholders with appropriate values:</p><ul>
<li><code>&lt;number_of_processors&gt;</code>: The number of processors/threads you want to use for parallel processing.</li>
<li><code>&lt;divergence_value&gt;</code>: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
<li><code>&lt;species_name&gt;</code>: The name of the species you are analyzing.</li>
<li><code>&lt;output_directory&gt;</code>: The directory where you want the output files to be saved.</li>
<li><code>&lt;min_length&gt;</code>&nbsp;and&nbsp;<code>&lt;max_length&gt;</code>: The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).</li>
</ul></div><div><ol>
<li>Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. You can download it from the source provided.</li>
</ol></div><div><ol>
<li>Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:</li>
</ol><blockquote><p>perl one_code_to_find_them_all.pl --rm &lt;RepeatMasker_out_file&gt; --length &lt;length_file&gt;</p></blockquote></div><p>&nbsp;</p></div><div><div><p>Replace&nbsp;<code>&lt;RepeatMasker_out_file&gt;</code>&nbsp;with the path to your RepeatMasker .out file, and&nbsp;<code>&lt;length_file&gt;</code>&nbsp;with the path to a file containing the lengths of the reference elements.</p></div><div><p>This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.</p></div><div><p>Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.</p></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</guid>
	<pubDate>Thu, 08 Aug 2024 23:31:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</link>
	<title><![CDATA[Tools to access the quality of your assembled genome !]]></title>
	<description><![CDATA[<ul dir="auto">
<li><a href="https://github.com/linsalrob/fasta_validator">FASTA VALIDATOR</a>&nbsp;+&nbsp;<a href="https://github.com/shenwei356/seqkit">SEQKIT RMDUP</a>: FASTA validation</li>
<li><a href="https://genometools.org/tools/gt_gff3validator.html">GENOMETOOLS GT GFF3VALIDATOR</a>: GFF3 validation</li>
<li><a href="https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl">ASSEMBLATHON STATS</a>: Assembly statistics</li>
<li><a href="https://genometools.org/tools/gt_stat.html">GENOMETOOLS GT STAT</a>: Annotation statistics</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS ADAPTOR</a>: Adaptor contamination pass/fail</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS GX</a>: Foreign organism contamination pass/fail</li>
<li><a href="https://gitlab.com/ezlab/busco">BUSCO</a>: Gene-space completeness estimation</li>
<li><a href="https://github.com/tolkit/telomeric-identifier">TIDK</a>: Telomere repeat identification</li>
<li><a href="https://github.com/oushujun/LTR_retriever/blob/master/LAI">LAI</a>: Continuity of repetitive sequences</li>
<li><a href="https://github.com/DerrickWood/kraken2">KRAKEN2</a>: Taxonomy classification</li>
<li><a href="https://github.com/igvteam/juicebox.js">HIC CONTACT MAP</a>: Alignment and visualisation of HiC data</li>
<li><a href="https://github.com/mummer4/mummer">MUMMER</a>&nbsp;&rarr;&nbsp;<a href="http://circos.ca/documentation/">CIRCOS</a>&nbsp;+&nbsp;<a href="https://plotly.com/">DOTPLOT</a>&nbsp;&amp;&nbsp;<a href="https://github.com/lh3/minimap2">MINIMAP2</a>&nbsp;&rarr;&nbsp;<a href="https://github.com/schneebergerlab/plotsr">PLOTSR</a>: Synteny analysis</li>
<li><a href="https://github.com/marbl/merqury">MERQURY</a>: K-mer completeness, consensus quality and phasing assessment</li>
</ul>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44775/genomic-architecture-surrounding-the-fusion-site-of-human-chromosome-2</guid>
	<pubDate>Tue, 04 Mar 2025 12:26:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44775/genomic-architecture-surrounding-the-fusion-site-of-human-chromosome-2</link>
	<title><![CDATA[Genomic architecture surrounding the fusion site of human chromosome 2]]></title>
	<description><![CDATA[<p>The article <strong>"Genomic Structure and Evolution of the Ancestral Chromosome Fusion Site in 2q13&ndash;2q14.1 and Paralogous Regions on Other Human Chromosomes (https://pmc.ncbi.nlm.nih.gov/articles/PMC187548/)"</strong> explores the genomic architecture surrounding the fusion site of human chromosome 2. This fusion event is a key evolutionary marker distinguishing humans from other great apes, as humans have 46 chromosomes while chimpanzees, gorillas, and orangutans possess 48. The fusion occurred through an end-to-end joining of two ancestral chromosomes, which remain separate in nonhuman primates.</p><h3><strong>Key Findings:</strong></h3><ol>
<li>
<p><strong>Chromosomal Fusion and Its Molecular Signature:</strong></p>
<ul>
<li>The fusion site is located at <strong>2q13&ndash;2q14.1</strong> and is characterized by <strong>degenerate telomeric sequences</strong> appearing interstitially, indicating the historical head-to-head joining of ancestral chromosomes.</li>
<li>Despite being a signature of a past fusion event, these telomeric repeats are no longer functional and have undergone sequence degradation over time.</li>
</ul>
</li>
<li>
<p><strong>Extensive Duplications in the Surrounding Genomic Region:</strong></p>
<ul>
<li>The study identifies <strong>large-scale segmental duplications</strong> flanking the fusion site, with several of these regions duplicated and scattered across multiple chromosomes.</li>
<li>These duplications are predominantly located in <strong>subtelomeric and pericentromeric regions</strong>, suggesting their role in genomic instability and chromosomal evolution.</li>
</ul>
</li>
<li>
<p><strong>Paralogous Regions and Their Evolutionary Relationships:</strong></p>
<ul>
<li>A <strong>168-kilobase (kb) segment</strong> near the fusion site has <strong>98%&ndash;99% sequence identity</strong> with three regions on <strong>chromosome 9 (9pter, 9p11.2, and 9q13)</strong>.</li>
<li>Another <strong>67-kb region distal to the fusion site</strong> shows a high degree of homology to sequences in <strong>chromosome 22qter</strong>.</li>
<li>Additionally, a <strong>100-kb segment</strong> exhibits <strong>96% sequence identity</strong> with a region in <strong>chromosome 2q11.2</strong>.</li>
</ul>
</li>
<li>
<p><strong>Comparative Genomics and Evolutionary Implications:</strong></p>
<ul>
<li>By comparing the duplicated sequences and their arrangement in primates, the researchers traced the order of duplication events leading to their present distribution.</li>
<li>The presence of specific repetitive elements within these duplicated segments serves as <strong>evolutionary markers</strong> that help infer their historical rearrangements.</li>
<li>Some of these <strong>duplicated regions are associated with chromosomal inversion breakpoints</strong>, potentially contributing to evolutionary changes in primates.</li>
<li>Recurrent <strong>structural rearrangements</strong> in these regions have been linked to human chromosomal disorders.</li>
</ul>
</li>
</ol><h3><strong>Conclusions and Implications:</strong></h3><ul>
<li>The findings provide valuable insights into <strong>the structural evolution of human chromosome 2</strong>, which played a crucial role in human speciation.</li>
<li>Understanding these <strong>segmental duplications</strong> and their evolutionary trajectories sheds light on <strong>genomic instability</strong>, which may contribute to <strong>human genetic diseases</strong>.</li>
<li>The study highlights how large-scale chromosomal rearrangements, such as fusion and duplication, have influenced the <strong>evolutionary divergence of humans</strong> from other primates.</li>
</ul><p>This research advances our understanding of <strong>human genome evolution</strong> and offers a foundation for studying the effects of <strong>structural variants in genetic disorders</strong>.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35550/circoletto-visualizing-sequence-similarity-with-circos</guid>
	<pubDate>Fri, 09 Feb 2018 10:23:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35550/circoletto-visualizing-sequence-similarity-with-circos</link>
	<title><![CDATA[Circoletto: visualizing sequence similarity with Circos]]></title>
	<description><![CDATA[<p><span>Circoletto, an online visualization tool based on Circos, which provides a fast, aesthetically pleasing and informative overview of sequence similarity search results.</span></p>
<p>Online version and downloadable software package for offline use (source code in PERL) freely available at&nbsp;<a href="http://bat.ina.certh.gr/tools/circoletto/" target="">http://bat.ina.certh.gr/tools/circoletto/</a></p>
<p><strong>Contact:</strong><a href="mailto:ndarz@certh.gr" target="">ndarz@certh.gr</a></p><p>Address of the bookmark: <a href="http://tools.bat.infspire.org/circoletto/" rel="nofollow">http://tools.bat.infspire.org/circoletto/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41957/majiq-2-is-released</guid>
	<pubDate>Thu, 09 Jul 2020 03:06:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41957/majiq-2-is-released</link>
	<title><![CDATA[MAJIQ 2 is released !]]></title>
	<description><![CDATA[<p>&nbsp;</p>
<p>Ability to detect, quantify, and visualize complex and de-novo splicing variations from RNASeq.</p>
<p>MAJIQ&rsquo;s accuracy compares favorably to other algorithms.</p>
<p>MAJIQ 2 is *way* faster, more memory and I/O efficient</p>
<p>New visualization (VOILA 2.0) Ability to analyze hundreds and thousands of samples Why so negative? (Support for a confident negative set)</p>
<p><span>Finally, a major reason we are excited about MAJIQ 2.0 is that it sets the code base for many new exciting algorithmic and visualization improvements, with application to new research questions so stay tuned!</span></p>
<p><span>More at <a href="https://biociphers.wordpress.com/2019/04/01/majiq-2-is-out/">https://biociphers.wordpress.com/2019/04/01/majiq-2-is-out/</a></span></p><p>Address of the bookmark: <a href="https://majiq.biociphers.org/" rel="nofollow">https://majiq.biociphers.org/</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34482/ribbon-visualizing-complex-genome-alignments-and-structural-variation</guid>
	<pubDate>Wed, 29 Nov 2017 07:40:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34482/ribbon-visualizing-complex-genome-alignments-and-structural-variation</link>
	<title><![CDATA[Ribbon: Visualizing complex genome alignments and structural variation:]]></title>
	<description><![CDATA[<p>Ribbon can be used for long reads, short reads, paired-end reads, and assembly/genome alignments. Instructions for each data format are available by clicking on "instructions" in each tab on the right.</p>
<p>Local installation:</p>
<p>You can install Ribbon locally from Github by following the instructions here:&nbsp;<a href="https://github.com/MariaNattestad/ribbon" target="_blank">https://github.com/MariaNattestad/Ribbon</a></p><p>Address of the bookmark: <a href="http://genomeribbon.com/" rel="nofollow">http://genomeribbon.com/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</guid>
	<pubDate>Fri, 08 Dec 2017 16:26:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</link>
	<title><![CDATA[jobTree based python wrapper to run the genome simulation tool suite Evolver]]></title>
	<description><![CDATA[<p><span>evolverSimControl</span><span>&nbsp;(</span><span>eSC</span><span>) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (</span><a href="http://evolution.genetics.washington.edu/phylip/newicktree.html">Newick format</a><span>). In addition to simply running evolver,&nbsp;</span><span>eSC</span><span>&nbsp;also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (</span><a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format5">.maf</a><span>) from leaf and branch nodes from a completed simulation and with the help of&nbsp;</span><a href="https://github.com/dentearl/mafTools/">mafJoin</a><span>, join them together into a single maf covering the entire simulation.</span></p><p>Address of the bookmark: <a href="https://github.com/dentearl/evolverSimControl" rel="nofollow">https://github.com/dentearl/evolverSimControl</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34620/mash-fast-genome-and-metagenome-distance-estimation-using-minhash</guid>
	<pubDate>Tue, 12 Dec 2017 17:30:12 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34620/mash-fast-genome-and-metagenome-distance-estimation-using-minhash</link>
	<title><![CDATA[Mash: fast genome and metagenome distance estimation using MinHash]]></title>
	<description><![CDATA[<p>Mash is normally distributed as a dependency-free binary for Linux or OSX (see&nbsp;<a href="https://github.com/marbl/Mash/releases">https://github.com/marbl/Mash/releases</a>). This source distribution is intended for other operating systems or for development. Mash requires c++11 to build, which is available in and GCC &gt;= 4.8 and OSX &gt;= 10.7.</p>
<p>See&nbsp;<a href="http://mash.readthedocs.org/">http://mash.readthedocs.org</a>&nbsp;for more information.</p><p>Address of the bookmark: <a href="https://github.com/marbl/Mash/releases" rel="nofollow">https://github.com/marbl/Mash/releases</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>