BOL: Related items

Interactive Bioinformatics Resources !

Jit — Thu, 12 Aug 2021 00:09:00 -0500

Learn how to use bioinformatics tools right from your browser.
Everything runs in a sandbox, so you can experiment all you want.

More at sandbox.bio

Address of the bookmark: http://sandbox.bio

Bioinformatics Training Material !

BioStar — Sat, 18 Mar 2023 11:26:18 -0500

Glittr is a curated list of bioinformatics training material.
All material is:

In a GitHub or GitLab repository
Free to use
Written in markdown or similar

NOTE: This list of courses is selected only based on the above criteria.
There are no checks on quality.

https://glittr.org/?per_page=25&sort_by=stargazers&sort_direction=desc

Address of the bookmark: https://glittr.org/?per_page=25&sort_by=stargazers&sort_direction=desc

16sRNA Database Download

LEGE — Wed, 24 Apr 2024 04:33:15 -0500

Downloading 16S rRNA databases can be crucial for various bioinformatics analyses, especially in microbiome research. However, it's important to note that databases can vary based on your specific needs, such as the taxonomic coverage you require or the type of analysis you're performing. Here's a general guideline on how you can obtain 16S rRNA databases:

NCBI (National Center for Biotechnology Information):
- NCBI provides various databases related to genetic information, including 16S rRNA sequences.
- You can access the 16S ribosomal RNA sequences from NCBI's Nucleotide database (https://www.ncbi.nlm.nih.gov/nucleotide/).
- Perform a search using keywords like "16S rRNA" or specific bacterial names to find relevant sequences.
- You can download sequences individually or in batches using the provided tools.
GreenGenes:
- GreenGenes is a widely used 16S rRNA gene sequence database.
- You can access it at http://greengenes.secondgenome.com/.
- GreenGenes provides precompiled databases for various purposes, including classification, alignment, and phylogenetic analysis.
SILVA:
- SILVA (https://www.arb-silva.de/) is another comprehensive database for ribosomal RNA (rRNA) sequences.
- It covers not only 16S rRNA but also other ribosomal RNA sequences.
- SILVA provides precompiled databases for various purposes, including taxonomic classification and alignment.
Ribosomal Database Project (RDP):
- RDP (http://rdp.cme.msu.edu/) is a curated database that offers 16S rRNA sequences.
- It provides tools for sequence analysis and classification.
- You can download sequences and taxonomy information from their website.
QIIME (Quantitative Insights Into Microbial Ecology):
- QIIME (https://qiime2.org/) is a widely used bioinformatics platform for microbiome analysis.
- It provides tools for analyzing microbial communities, including processing 16S rRNA sequences.
- QIIME often includes its own preprocessed 16S rRNA databases that can be used for analysis within the platform.

Before downloading any database, make sure to read the terms of use and citation requirements, as some databases may have specific usage policies. Additionally, consider the compatibility of the database with your analysis pipeline and software tools.

NCBI 16s RNA database location ftp://ftp.ncbi.nih.gov/blast/db/16SMicrobial.tar.gz

Bioinformatics Codes Search

Jitendra Narayan — Thu, 15 Aug 2013 11:08:52 -0500

I bet, this website will be your best friend in near future. This helps us to explore the existing open source codes and learn from it.

You can find some useful open source bioinformatics codes for your analysis work. You can use the left bar options to filtere out or narrow down your search result. This webpage can be an useful resource for a beginners bioinformatician as it contain several bioinformatics basics script that are commonly used by biological programmers and biologist.

Stand on the slumped, dandruff-covered shoulders of millions of computer nerds. _/\_

Enjoy the code and research work.

http://code.ohloh.net/search?s=bioinformatics

Address of the bookmark: http://code.ohloh.net/search?s=bioinformatics

Programming language to build synthetic DNA

Jit — Mon, 30 Sep 2013 16:37:24 -0500

A team led by Georg Seelig (http://homes.cs.washington.edu/~seelig/index.html) at University of Washington has developed a programming language for chemistry that it hopes will streamline efforts to design a network that can guide the behavior of chemical-reaction mixtures in the same way that embedded electronic controllers guide cars, robots and other devices. In medicine, such networks could serve as “smart” drug deliverers or disease detectors at the cellular level.

Reference & More @

http://www.nature.com/nnano/journal/vaop/ncurrent/full/nnano.2013.189.html

http://www.washington.edu/news/2013/09/30/uw-engineers-invent-programming-language-to-build-synthetic-dna/

Image source: washington.edu

Research assistant in computational biology

Wed, 24 Jun 2015 07:55:16 -0500

http://www.au.dk/en/about/vacant-positions/scientific-positions/stillinger/Vacancy/show/743161/5283/

Qualifications:
MSc degree in computer science, engineering, genetics or similar field with a strong emphasis on computational methods.

Deadline
01.08.2015

Public Databases for Bioinformatics !

Jit — Tue, 23 Mar 2021 05:32:15 -0500

https://www.nature.com/articles/s41467-020-17155-y

Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"

For use case 1 we obtained the following ENCODE and ROADMAP datasets https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz, https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam, https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam. Blacklisted regions were obtained from http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz. The human genome version hg38 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz.

For use case 2 we used the set of narrowPeak files summarized in https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt (archived version v1.0.1). The human genome version hg19 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

For use case 3 we used the ENCODE datasets https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam, https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig, https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam as we as the GENCODE annotation v29 from ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz.

Address of the bookmark: http://mitra.stanford.edu/

Ribbon: Visualizing complex genome alignments and structural variation:

Jit — Wed, 29 Nov 2017 07:40:22 -0600

Ribbon can be used for long reads, short reads, paired-end reads, and assembly/genome alignments. Instructions for each data format are available by clicking on "instructions" in each tab on the right.

Local installation:

You can install Ribbon locally from Github by following the instructions here: https://github.com/MariaNattestad/Ribbon

Address of the bookmark: http://genomeribbon.com/

jobTree based python wrapper to run the genome simulation tool suite Evolver

Jit — Fri, 08 Dec 2017 16:26:32 -0600

evolverSimControl (eSC) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (Newick format). In addition to simply running evolver, eSC also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (.maf) from leaf and branch nodes from a completed simulation and with the help of mafJoin, join them together into a single maf covering the entire simulation.

Address of the bookmark: https://github.com/dentearl/evolverSimControl

Mash: fast genome and metagenome distance estimation using MinHash

Jit — Tue, 12 Dec 2017 17:30:12 -0600

Mash is normally distributed as a dependency-free binary for Linux or OSX (see https://github.com/marbl/Mash/releases). This source distribution is intended for other operating systems or for development. Mash requires c++11 to build, which is available in and GCC >= 4.8 and OSX >= 10.7.

See http://mash.readthedocs.org for more information.

Address of the bookmark: https://github.com/marbl/Mash/releases