X BOL wishing you a very and Happy New year

Alternative content

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Public Databases for Bioinformatics !

http://mitra.stanford.edu/
https://www.nature.com/articles/s41467-020-17155-y

Server Infrastructure: File Server: dhara: Synology 3614 Storage Appliance 4 Core Xeon 108TB disk storage 10Gb ethernet to SCG3 Access atx: dhara:5000 Has btsync server (try it - its much better than dropbox) Compute Servers: nandi: Kundaje and Phi Server 24 intel cores 256GB RAM 500GB of SSD storage 36TB RAID6 local storage 4 Intel Phi's (space for 4 more GPU's) durga: Montgomery and sensitive data 24 intel cores 256GB RAM 500GB of SSD RAID0 storage 60TB RAID6 local storage mitra: Bassik and Web/DB Server 24 core 256GB RAM 500GB of SSD RAID0 storage 36TB RAID6 local storage vayu: Kundaje GPU server 4 core 64GB RAM 200GB of SSD storage 8TB RAID10 local storage 4 Nvidia GTX 970 4GB GPUs amold: Bickel and SGE server 32 AMD core 128GB RAM 200GB of SSD storage 12TB RAID5 local storage wotan: Bickel and SGE server 64 AMD core 256GB RAM 200GB of SSD storage 12TB RAID5 local storage Filesystem: /users/$USER default home directory full backups nightly nfs mount to dhara should store code, papers, and other highly processed data here /mnt/data/ globally accessible data should store common data here e.g. genomes and indexes, annotations, ENCODE data if you dont want this to count towards your quote you must chown /mnt/lab_data/$LAB/ lab accessible data should store lab project data here e.g. ATAC-seq prediction data, enhancer prediction, motif calls /srv/scratch/$USER fast local storage not backed up, but on raid and data will never be deleted most analysis should be performed here /srv/persistent/$USER fast local storage synced nightly, but not backed up ie if the hard drives fail or you delete something and notice within 24 hours we can recover. Otherwise not. (vs home which is properly backed up ) intermediate analysis products that would be hard to recover should be stored here e.g. stochastic analysis results that need to be kept so that paper results can be reproduced /srv/www/$LABNAME/ web accessible from mitra.stanford.edu *NOT BACKED UP* Some parallel programming patterns: # gzip a bunch of files parallel gzip -- *.FILESTOGZIP # fork example in python: (for more detailed examples look at https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py) import os import time import random import multiprocessing class ProcessSafeOPStream( object ): def __init__( self, writeable_obj ): self.writeable_obj = writeable_obj self.lock = multiprocessing.Lock() self.name = self.writeable_obj.name return def write( self, data ): self.lock.acquire() self.writeable_obj.write( data ) self.writeable_obj.flush() self.lock.release() return def close( self ): self.writeable_obj.close() def worker(queue, ofp): # Try without this random.seed() while True: i = queue.get() if i == 'FINISHED': return # simulate an expensive function x = random.random() time.sleep(x/10) print i, x ofp.write("%i\t%s\n" % (i, x)) NSIMS = 10000 NPROC = 25 # populate queue todo = multiprocessing.Queue() for i in xrange(NSIMS): todo.put(i) for i in xrange(NPROC): todo.put('FINISHED') ofp = ProcessSafeOPStream( open("output.txt", "w") ) pids = [] for i in xrange(NPROC): pid = os.fork() if pid == 0: worker(todo, ofp) os._exit(0) else: pids.append(pid) for pid in pids: os.waitpid(pid, 0) ofp.close() print "FINISHED"

For use case 1 we obtained the following ENCODE and ROADMAP datasets https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gzhttps://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bamhttps://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam. Blacklisted regions were obtained from http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz. The human genome version hg38 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz.

For use case 2 we used the set of narrowPeak files summarized in https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt (archived version v1.0.1). The human genome version hg19 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

For use case 3 we used the ENCODE datasets https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bamhttps://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWighttps://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam as we as the GENCODE annotation v29 from ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz.