<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/32730?offset=1430</link>
	<atom:link href="https://bioinformaticsonline.com/related/32730?offset=1430" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/2573/most-commonly-used-awk-by-bioinformatician</guid>
	<pubDate>Mon, 19 Aug 2013 01:12:38 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/2573/most-commonly-used-awk-by-bioinformatician</link>
	<title><![CDATA[Most Commonly used Awk by Bioinformatician]]></title>
	<description><![CDATA[<p style="text-align: center;">&nbsp;</p><p>Awk is a programming language that is specifically designed for quickly manipulating space delimited data. Although you can achieve all its functionality with Perl, awk is simpler in many practical cases.</p><p>Why awk? You can replace a pipeline of 'stuff | grep | sed | cut...' with a single call to awk. For a simple script, most of the timelag is in loading these apps into memory, and it's much faster to do it all with one. This is ideal for something like an openbox pipe menu where you want to generate something on the fly. You can use awk to make a neat one-liner for some quick job in the terminal, or build an awk section into a shell script. You can find a lot of online tutorials, but here I will only show a few examples which cover most of bioinformatician daily uses of awk.</p><p>choose rows where column 3 is larger than column 5:</p><p>awk '$3&gt;$5' input.txt &gt; output.txt</p><p>extract column 2,4,5:</p><p>awk '{print $2,$4,$5}' input.txt &gt; output.txt</p><p>awk 'BEGIN{OFS="\t"}{print $2,$4,$5}' input.txt</p><p>show rows between 20th and 80th:</p><p>awk 'NR&gt;=20&amp;&amp;NR&lt;=80' input.txt &gt; output.txt</p><p>calculate the average of column 2:</p><p>awk '{x+=$2}END{print x/NR}' input.txt</p><p>regex (egrep):</p><p>awk '/^test[0-9]+/' input.txt</p><p>calculate the sum of column 2 and 3 and put it at the end of a row or replace the first column:</p><p>awk '{print $0,$2+$3}' input.txt</p><p>awk '{$1=$2+$3;print}' input.txt</p><p>join two files on column 1:</p><p>awk 'BEGIN{while((getline&lt;"file1.txt")&gt;0)l[$1]=$0}$1 in l{print $0"\t"l[$1]}' file2.txt &gt; output.txt</p><p>count number of occurrence of column 2 (uniq -c):</p><p>awk '{l[$2]++}END{for (x in l) print x,l[x]}' input.txt</p><p>apply "uniq" on column 2, only printing the first occurrence (uniq):</p><p>awk '!($2 in l){print;l[$2]=1}' input.txt</p><p>count different words (wc):</p><p>awk '{for(i=1;i!=NF;++i)c[$i]++}END{for (x in c) print x,c[x]}' input.txt</p><p>deal with simple CSV:</p><p>awk -F, '{print $1,$2}'</p><p>substitution (sed is simpler in this case):</p><p>awk '{sub(/test/, "no", $0);print}' input.txt</p><p>&nbsp;</p><p>OK now here's where to read this stuff properly explained. roll</p><p>Two thorough tutorials:</p><p>http://www.gnu.org/software/gawk/manual/gawk.html</p><p>http://www.grymoire.com/Unix/Awk.html</p><p>A famous list of useful one-liners - though they're short, many are quite tricky:</p><p>http://www.pement.org/awk/awk1line.txt</p><p>And some nice explanations of those one-liners. After reading this you'll have a pretty good grasp!</p><p>http://www.catonmat.net/blog/awk-one-li &hellip; -part-one/</p><p>http://www.catonmat.net/blog/ten-awk-ti &hellip; -pitfalls/</p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44581/biokit-a-set-of-tools-dedicated-to-bioinformatics-data-visualisation</guid>
	<pubDate>Tue, 18 Jun 2024 02:04:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44581/biokit-a-set-of-tools-dedicated-to-bioinformatics-data-visualisation</link>
	<title><![CDATA[BioKit: a set of tools dedicated to bioinformatics, data visualisation]]></title>
	<description><![CDATA[<p><span>BioKit is a set of tools dedicated to bioinformatics, data visualisation (</span><a href="https://biokit.readthedocs.io/en/latest/references.html#module-biokit.viz" title="biokit.viz"><code><span>biokit.viz</span></code></a><span>), access to online biological data (e.g. UniProt, NCBI thanks to bioservices). It also contains more advanced tools related to data analysis (e.g.,&nbsp;</span><a href="https://biokit.readthedocs.io/en/latest/references.html#module-biokit.stats" title="biokit.stats"><code><span>biokit.stats</span></code></a><span>). Since R is quite common in bioinformatics, we also provide a convenient module to run R inside your Python scripts or shell (:mod:biokit.rtools module).</span></p><p>Address of the bookmark: <a href="https://biokit.readthedocs.io/en/latest/index.html" rel="nofollow">https://biokit.readthedocs.io/en/latest/index.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/4106/phd-at-national-institute-for-research-in-reproductive-health</guid>
  <pubDate>Fri, 30 Aug 2013 04:50:35 -0500</pubDate>
  <link></link>
  <title><![CDATA[PhD at National Institute for Research in Reproductive Health]]></title>
  <description><![CDATA[
<p>National Institute for Research in Reproductive Health</p>

<p>(Indian Council of Medical Research )<br />Jehangir Merwanji Street, Parel, Mumbai 400 012</p>

<p>Advertisement No. 1/NIRRH/Ph.D. 2013<br />Admission to Ph.D. Programme – 2013</p>

<p>National Institute for Research in Reproductive Health, Mumbai, a premier institute of the Indian Council of Medical Research, conducts basic, clinical and operational research in different areas of reproductive health. The thrust areas of research include: Fertility Regulation, Infertility and Reproductive Disorders, Reproductive Tract Infections, Maternal and Child Health, Osteoporosis, Genetic Disorders, Stem Cell Biology, Structural Biology, Bioinformatics and Reproductive Toxicology. Institute is affiliated to the University of Mumbai for the award of Ph.D. degree in Applied Biology, Biochemistry, Life Sciences and Biotechnology. The institute invites applications from young and bright students for enrollment in Ph.D. programme.</p>

<p>More at http://www.nirrh.res.in/announcements/phd_program_2013.htm</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/2742/baumbach-lab</guid>
  <pubDate>Wed, 21 Aug 2013 10:56:35 -0500</pubDate>
  <link></link>
  <title><![CDATA[Baumbach Lab]]></title>
  <description><![CDATA[
<p>The Computational Biology research group was established in October 2012 at the Department of Mathematics and Computer Science (IMADA) at the University of Southern Denmark (SDU). It emerged from the Computational Systems Biology group, founded in March 2010 at the Max Planck Institute for Informatics (MPII) and the Cluster of Excellence for Multimodel Computing and Interaction (MMCI) at Saarland University, Saarbrücken, Germany.<br />​<br />The group is headed by Prof. Dr. Jan Baumbach and currently hosts nine PhD students and one postdoctoral fellow at both, IMADA/SDU and MMCI/MPII.</p>

<p>More at &gt;&gt; http://www.baumbachlab.net/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34377/genomicus-genome-browser-that-enables-users-to-navigate-in-genomes-in-several-dimensions</guid>
	<pubDate>Sat, 18 Nov 2017 16:10:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34377/genomicus-genome-browser-that-enables-users-to-navigate-in-genomes-in-several-dimensions</link>
	<title><![CDATA[Genomicus: genome browser that enables users to navigate in genomes in several dimensions]]></title>
	<description><![CDATA[<p>Genomicus is a genome browser that enables users to navigate in genomes in several dimensions: linearly along chromosome axes, transversaly across different species, and chronologicaly along evolutionary time.</p>
<p>Once a query gene has been entered, it is displayed in its genomic context in parallel to the genomic context of all its orthologous and paralogous copies in all the other sequenced metazoan genomes. Moreover, Genomicus stores and displays the predicted ancestral genome structure in all the ancestral species within the phylogenetic range of interest.</p>
<p>All the data on extant species displayed in this browser are from&nbsp;<a href="http://www.ensembl.org/">Ensembl</a>.</p><p>Address of the bookmark: <a href="http://genomicus.biologie.ens.fr/genomicus-90.01/cgi-bin/search.pl" rel="nofollow">http://genomicus.biologie.ens.fr/genomicus-90.01/cgi-bin/search.pl</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/2839/look-up-a-biological-numbers</guid>
	<pubDate>Fri, 23 Aug 2013 03:27:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/2839/look-up-a-biological-numbers</link>
	<title><![CDATA[Look up a biological numbers]]></title>
	<description><![CDATA[<p><strong>Did you ever need to look up a number</strong><span>&nbsp;like the volume of a cell or the cellular concentration of ATP, only to find yourself spending much more time than you wanted on the Internet or flipping through textbooks - all without much success?&nbsp;</span><br><br><span>Well, it didn&rsquo;t happen only to you. It is often surprising how difficult it can be to find concrete biological numbers, even for properties that have been measured numerous times. To help solve this for one and all, BioNumbers (</span><strong>the database of key numbers in molecular biology</strong><span>) was created. Along with the numbers, you'll find the relevant&nbsp;</span><strong>references to the original literature</strong><span>, useful comments, and related numbers.&nbsp;</span></p>
<p><span><span>To cite BioNumbers please refer to: Milo et al. Nucl. Acids Res. (2010) 38: D750-D753. When using a specific entry from the database it is highly recommended that you also specify the BioNumbers 6 digit ID, e.g. "BNID 100986, Milo et al 2010".&nbsp;</span></span></p><p>Address of the bookmark: <a href="http://bionumbers.hms.harvard.edu/" rel="nofollow">http://bionumbers.hms.harvard.edu/</a></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34571/mugsy-multiple-whole-genome-alignment-tool</guid>
	<pubDate>Fri, 08 Dec 2017 17:41:14 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34571/mugsy-multiple-whole-genome-alignment-tool</link>
	<title><![CDATA[Mugsy: multiple whole genome alignment tool]]></title>
	<description><![CDATA[<p><span>Mugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.</span></p>
<p>To cite Mugsy, use:</p>
<p>Angiuoli SV and Salzberg SL.&nbsp;<a href="http://bioinformatics.oxfordjournals.org/content/27/3/334">Mugsy: Fast multiple alignment of closely related whole genomes.</a><em>Bioinformatics</em>&nbsp;2011 27(3):334-4</p><p>Address of the bookmark: <a href="http://mugsy.sourceforge.net/" rel="nofollow">http://mugsy.sourceforge.net/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/3952/ancestor-at-work</guid>
	<pubDate>Sun, 25 Aug 2013 19:45:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/3952/ancestor-at-work</link>
	<title><![CDATA[Ancestor at work !!!]]></title>
	<description><![CDATA[<p>When they will learn Bioinformatics :)</p>]]></description>
	<dc:creator>Jit</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/3952" length="10064" type="image/gif" />
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35429/list-of-visualization-tools-for-genome-alignments</guid>
	<pubDate>Fri, 02 Feb 2018 13:25:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35429/list-of-visualization-tools-for-genome-alignments</link>
	<title><![CDATA[List of visualization tools for genome alignments]]></title>
	<description><![CDATA[<p><span>Genome</span><span>&nbsp;browsers are useful not only for showing final results but also for improving analysis protocols, testing data quality, and generating result drafts. Its integration in analysis pipelines allows the optimization of parameters, which leads to better results. But sometime, we need publication ready figure of genomes. Following are the list of genome alignment visualization tools, which could be useful for analysis and&nbsp;interpretation of results:</span></p><p>ABySS Explorer</p><p>Interactive Java application that uses a novel graph-based representation to display a sequence assembly and associated metadata</p><p>http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer</p><p>BamView</p><p>Genome browser and annotation tool that allows visualization of sequence features, next-generation sequencing (NGS) data and the results of analyses within the context of the sequence, and also its six-frame translation</p><p>http://www.sanger.ac.uk/resources/software/artemis/</p><p>DNannotator&nbsp;</p><p>Annotation web toolkit for regional genomic sequences</p><p>http://bioapp.psych.uic.edu/DNannotator.htm</p><p>JVM&nbsp;</p><p>Java Visual Mapping tool for NGS reads</p><p>http://www.springer.com/cda/content/document/cda_downloaddocument/9789401792448-c2.pdf?SGWID=0-0-45-1487072-p176815501</p><p>LookSeq&nbsp;</p><p>Web-based visualization of sequences derived from multiple sequencing technologies. Low- or high-depth read pileups and easy visualization of putative single nucleotide and structural variation</p><p>http://lookseq.sourceforge.net</p><p>MagicViewer&nbsp;</p><p>Visualization of short read alignment, identification of genetic variation and association with annotation information of a reference genome</p><p>http://bioinformatics.zj.cn/magicviewer/</p><p>MapView&nbsp;</p><p>Alignments of huge-scale single-end and pair-end short reads</p><p>http://omictools.com/mapview-s1367.html</p><p>MultiPipMaker</p><p>Computes alignments of similar regions in two DNA sequences. The resulting alignments are summarized with a &lsquo;percent identity plot&rsquo; (pip)</p><p>http://pipmaker.bx.psu.edu/pipmaker/</p><p>PileLineGUI&nbsp;</p><p>Handling genome position files in NGS studies</p><p>http://sing.ei.uvigo.es/pileline/pilelinegui.html</p><p>SAMtools tview&nbsp;</p><p>Simple and fast text alignment viewer; NGS compatible</p><p>http://www.htslib.org/</p><p>SEWAL</p><p>Uses a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run</p><p>http://www.sourceforge.net/projects/sewal</p><p>STAR&nbsp;</p><p>A web-based integrated solution to management and visualization of sequencing data</p><p>http://wanglab.ucsd.edu/star/browser</p><p>SVA&nbsp;</p><p>Software for annotating and visualizing sequenced human genomes</p><p>http://www.svaproject.org</p><p>Viewer (IGV)&nbsp;</p><p>Visualization of large heterogeneous datasets, providing a smooth and intuitive user experience at all levels of genome resolution</p><p>https://www.broadinstitute.org/igv/</p><p>ZOOM Lite&nbsp;</p><p>NGS data mapping and visualization software</p><p>http://bioinfor.com/zoom/lite/</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>