BOL: Related items

Computer experts in biotechnology laboratory

Jit — Wed, 04 Dec 2013 02:11:43 -0600

Only bioinformatician can understand that multiplication and division are different but same thing :)

Disclaimer: This cartoon is solely designed to create humour and fun, not to offend any computer experts.

Public Databases for Bioinformatics !

Jit — Tue, 23 Mar 2021 05:32:15 -0500

https://www.nature.com/articles/s41467-020-17155-y

Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"

For use case 1 we obtained the following ENCODE and ROADMAP datasets https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz, https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam, https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam. Blacklisted regions were obtained from http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz. The human genome version hg38 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz.

For use case 2 we used the set of narrowPeak files summarized in https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt (archived version v1.0.1). The human genome version hg19 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

For use case 3 we used the ENCODE datasets https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam, https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig, https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam as we as the GENCODE annotation v29 from ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz.

Address of the bookmark: http://mitra.stanford.edu/

Hidden Markov Models, Viterbi Algorithm, Markov Chain Exploration with script

Manisha Mishra — Thu, 14 Nov 2013 13:36:56 -0600

Hidden Markov Models, the Viterbi Algorithm, and CpG Islands (in VB6)

Problem :

The CG island is a stretch of DNA (usually longer than 200 bases) in which the frequency of the CG sequence is higher than other regions. It is also called the CpG island, where "p" simply indicates that "C" and "G" are connected by a phosphodiester bond.

CpG islands are often located around the promoters of housekeeping genes (which are essential for general cell functions) or other genes frequently expressed in a cell. At these locations, the CG sequence is not methylated. By contrast, the CG sequences in inactive genes are usually methylated to suppress their expression. The methylated cytosine may be converted to thymine by accidental deamination. Unlike the cytosine to uracil mutation which is efficiently repaired, the cytosine to thymine mutation can be corrected only by the mismatch repair which is very inefficient. Hence, over evolutionary time scales, the methylated CG sequence will be converted to the TG sequence.

Find step wise explanationand implementation steps at http://dna.cs.byu.edu/bio465/Labs/hmm.shtml

Source code with explanation http://www.tannerhelland.com/1187/hidden-markov-models-viterbi-algorithm-cpg-islands-in-vb6/

Fore detail understanding of HMM read this excellent tutorial http://www.cs.ubc.ca/~murphyk/Software/HMM/labman2.pdf

Viterbi Algo at http://en.wikipedia.org/wiki/Viterbi_path

For firther reading Wiki page http://en.wikipedia.org/wiki/Hidden_Markov_model

On CpG island paper and for indepth understanding http://www.biomedcentral.com/1471-2164/12/S2/S10

If you are more interested in exploring Markov Chain Exploration and understand it with graphical version please visit http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=75049&lngWId=1

Reference:

1.http://www.planet-source-code.com

2. http://www.tannerhelland.com

Molecular Bioinformatics Lab (MBL)

Tue, 19 Nov 2013 18:23:27 -0600

The main subject of interest in our laboratory is the study of the relationship among sequence, structure, and function in proteins and nucleic acids. Our research can be divided in two major topics:

the study of the sequence-structure relationship
(application -> structure prediction)
the study of the structure-function relationship
(application -> function prediction)

Therefore, anything related to the configuration (sequence) and conformation (structure) in atomic systems of proteins and nucleic acids, and the interaction of these with other elements (function) is of our major interest.

Lab page @ http://melolab.org/mbl/

Ribbon: Visualizing complex genome alignments and structural variation:

Jit — Wed, 29 Nov 2017 07:40:22 -0600

Ribbon can be used for long reads, short reads, paired-end reads, and assembly/genome alignments. Instructions for each data format are available by clicking on "instructions" in each tab on the right.

Local installation:

You can install Ribbon locally from Github by following the instructions here: https://github.com/MariaNattestad/Ribbon

Address of the bookmark: http://genomeribbon.com/

Scientist Positions @ Gujarat State Biotechnology Mission

Mon, 25 Nov 2013 10:26:39 -0600

Gujarat State Biotechnology Mission invite applications [Online Only] under various projects* namely Gujarat Biodiversity Gene Bank (BioGene), Gujarat Institute of Genomics (GIG), Gujarat Institute of Bioinformatics [GIBS] and Gujarat Institute of Marine Biotechnology. Eligible candidates can Apply through online application portal.

1 Scientist E 3

50,000/-

M.Sc. in Life sciences or Plant Sciences or Biotechnology or Microbiology or Bioinformatics or Ph.D. from a recognized university in any of above subject.

Minimum 8 Yrs. of experience after M.Sc. or 5 Yrs. of experience after Ph.D. in responsible position of work in R & D in the area of genomics/ conservation biotechnology/bioinformatics/Planning/Scientific Administration in Science and technology organization. Highly qualified in the area of modern biology, as evidenced through research experience and proven ability to carry out work in the area of conservation biotechnology. Age limit not exceeding 40yrs.

2 Scientist B 6

30,000/-

M.Sc. in Life sciences or Plant Sciences or Biotechnology or Microbiology or Bioinformatics or Ph.D. from a recognized university in any of above subject shall be preferred.

Minimum 3 Yrs. of experience after M.Sc. in responsible position of work in R & D in the area of genomics/ conservation biotechnology/ bioinformatics /Planning/Scientific Administration in Science and technology organization. Highly qualified in the area of modern biology, as evidenced through research experience and proven ability to carry out work in the area of conservation biotechnology. Age limit not exceeding 35yrs.

The positions are purely on contractual basis for 11 months. Interested candidates can apply online in specified format available at "http://leogen.in/recruit/" The last date of applying is 24th December, 2013. Applications must be submitted online only. Applications submitted in any other format except online prescribed performa will be rejected. Candidates in service must apply through proper channel. Candidates will be required to provide original documents along with duly filled and signed application Performa, as and when called for interview.

For more details please visit the website URL : http://leogen.in/recruit

Mugsy: multiple whole genome alignment tool

Jit — Fri, 08 Dec 2017 17:41:14 -0600

Mugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.

To cite Mugsy, use:

Angiuoli SV and Salzberg SL. Mugsy: Fast multiple alignment of closely related whole genomes.Bioinformatics 2011 27(3):334-4

Address of the bookmark: http://mugsy.sourceforge.net/

GABi

Fri, 06 Dec 2013 16:43:01 -0600

GABi Research
The major researching fields defined as the GABi scope are described next:
Sequence Analysis
Protein Structure Prediction
Comparative Genomics
Functional Analysis of Residues on Protein Families
Gene/Protein Networks
Genome structure & base composition
Highthroughput data analysis from NGS

Lab Page http://gabi.cidbio.org/index/

Magic-BLAST: a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome.

Jit — Tue, 26 Dec 2017 22:23:39 -0600

Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.

Magic-BLAST incorporates within the NCBI BLAST code framework ideas developed in the NCBI Magic pipeline, in particular hit extensions by local walk and jump (http://www.ncbi.nlm.nih.gov/pubmed/26109056), and recursive clipping of mismatches near the edges of the reads, which avoids accumulating artefactual mismatches near splice sites and is needed to distinguish short indels from substitutions near the edges.

Address of the bookmark: https://ncbi.github.io/magicblast/

LAPTI Lab

Thu, 12 Dec 2013 18:19:12 -0600

The main theme of our research is the understanding of how genetic information is decoded from DNA into RNA and proteins. Someone may find this topic a little strange and argue that we already know how this is happening.

Translational recoding.

RNA editing.

Evolution of the genetic code and translation.

More at http://lapti.ucc.ie/research.html

Lab page http://lapti.ucc.ie/index.html