BOL: Related items

MimicrEE2: Genome-wide forward simulations of Evolve and Resequencing studies

Jit — Fri, 28 Sep 2018 09:21:14 -0500

MimicrEE2, a multi-threaded Java program for genome-wide forward simulations of evolving populations. MimicrEE2 enables the convenient usage of available genomic resources, supports biological particulars of model organism frequently used in E&R studies and offers a wide range of different adaptive models (selective sweeps, polygenic adaptation, epistasis). MimicrEE2 runs on any computer with Java installed. It is distributed under the GPLv3 license at https://sourceforge.net/projects/mimicree2/.

Address of the bookmark: https://sourceforge.net/projects/mimicree2/

Lecturer in Evolutionary Biology (Bioinformatics) at DEPARTMENT of ZOOLOGY | TE TARI MĀTAI KARAREHE DIVISION of SCIENCES | TE ROHE A AHIKAROA

Tue, 23 Feb 2021 02:05:15 -0600

DEPARTMENT of ZOOLOGY | TE TARI MĀTAI KARAREHE
DIVISION of SCIENCES | TE ROHE A AHIKAROA

Applications are invited for the position of Lecturer in Evolutionary Biology (Bioinformatics).

We are seeking a person with a relevant doctorate, and demonstrated potential to develop as an outstanding researcher and teacher in evolutionary bioinformatics in the Department of Zoology. The position affords an exciting opportunity for an emerging scholar to research and teach in a vibrant and diverse Department. The successful candidate will develop a transformative and collaborative research program, supporting the university's commitment to excellence in research.

Your skills and experience

A PhD with a background in analysis of high-throughput sequencing data and evolutionary biology.
Knowledge of and familiarity with a range of bioinformatics skills, concepts, and practices as they relate to the biology of animals, including genomic, transcriptomic and metabarcoding data analyses.
A strong interest, and experience, in research and teaching of bioinformatics and evolutionary genomics.
An ability to contribute to teaching and learning environments that support engagement of students and staff with bioinformatics and genomics.
Be committed to and or have established connections or track record of working with national and local bioinformaticians.
Be committed to being a productive collaborator with a track record of working collegially.
Further details

This is a confirmation-path (tenure track) position at the level of Lecturer. The successful candidate is expected to take up duties by 1 July 2021.

To see a full job description and to apply online go to: https://otago.taleo.net/careersection/2/jobdetail.ftl?job=2100342

Chromosome breakpoint - a breakup to remember

BioStar — Tue, 07 Mar 2023 13:31:54 -0600

Chromosome breakpoint refers to the physical location where a chromosome is broken and rearranged. Chromosome breakage can occur spontaneously or be induced by environmental factors such as radiation, chemicals, or viruses. The rearrangement of genetic material resulting from a chromosome breakpoint can have important consequences, including the development of genetic diseases, chromosomal abnormalities, or cancer.

Chromosome breakpoints can occur in two ways: interstitial or terminal. Interstitial breakpoints occur within the chromosome, while terminal breakpoints occur at the end of the chromosome. Terminal breakpoints can lead to the loss of genetic material, whereas interstitial breakpoints can result in the duplication or deletion of genetic material.

Chromosome breakpoints can be detected using a variety of techniques, including cytogenetic analysis, fluorescence in situ hybridization (FISH), and molecular methods such as polymerase chain reaction (PCR) and next-generation sequencing (NGS). These techniques can also help identify the exact location of the breakpoint and the nature of the rearrangement, such as translocations, inversions, deletions, or duplications.

Translocations are one of the most common types of chromosome rearrangements caused by breakpoints. In a translocation, genetic material is exchanged between two different chromosomes, resulting in a balanced or unbalanced distribution of genetic material. Unbalanced translocations can cause genetic diseases or developmental abnormalities, while balanced translocations can be inherited without any apparent phenotypic effects.

Inversions occur when a chromosome segment is inverted, resulting in a change in the order of genetic material. Inversions can be pericentric, involving the centromere, or paracentric, not involving the centromere. Inversions can cause genetic diseases or phenotypic effects if they disrupt the function of essential genes or regulatory elements.

Deletions and duplications are caused by interstitial breakpoints that result in the loss or gain of genetic material. Deletions can cause genetic diseases or developmental abnormalities if they involve essential genes or regulatory elements. Duplications can also have phenotypic effects, depending on the location and size of the duplicated segment.

Chromosome breakpoints can also be involved in the formation of complex chromosomal rearrangements, such as ring chromosomes or dicentric chromosomes. These complex rearrangements can have important clinical implications, as they can cause genetic diseases or cancer.

In conclusion, chromosome breakpoints are important genetic events that can lead to the rearrangement of genetic material and have important clinical implications. The detection and characterization of chromosome breakpoints using cytogenetic, molecular, and genomic methods are essential for the diagnosis, prognosis, and treatment of genetic diseases and cancer. Further research is needed to understand the molecular mechanisms underlying chromosome breakage and to develop new therapies targeting these events.

Scalpel

Shruti Paniwala — Wed, 20 Aug 2014 02:07:58 -0500

A team from Cold Spring Harbor Laboratory has released an algorithm, called Scalpel, for finding insertions and deletions in next generation sequencing data sets. Scalpel, which is open source and available for download on SourceForge, outperformed the popular tools GATK HaplotypeCaller and SOAPindel in test runs on both simulated and real whole human exomes.

Like other indel callers, Scalpel works by performing de novo assembly of regions of interest, so that misalignment to the reference genome cannot obscure the presence of an insertion or deletion. Scalpel's innovation is to repeatedly check its assembly before comparing to the reference genome, to account for simple sequence repeats that are a regular source of error in indel calling. When Scalpel assembles an exon, it collects reads that map to that exon (including partial matches), splits them into k-mers, and creates a de Bruijn graph to span the exon; however, if it detects repeats in the map, it iteratively increases the size of the k-mers by one base until the repeats are eliminated. This ensures that the final assembly of the exon is highly accurate while minimizing compute time.

The Cold Spring Harbor team's validation of Scalpel, published over the weekend in Nature Methods, compares Scalpel's performance on a live whole exome against HaplotypeCaller and SOAPindel. The donor is an individual with serious neurological disorders, which may be linked to a high incidence of indels. One thousand indels from this individual's exome, called by one or more of the informatics pipelines, were selected for focused resequencing. This resequencing revealed a 77% true positive rate for Scalpel calls, dramatically better than the rates for either of the competing tools; Scalpel performed especially well with indels longer than five base pairs, a traditional weak point for indel callers.

Finally, the authors demonstrate Scalpel's use on a large set of genetic data from nearly 600 families who donated samples to the Simons Simplex Collection, a project of the Simons Foundation Autism Research Initiative. Scalpel found a very high enrichment for indels in children affected by autism, compared with their unaffected siblings, a pattern that persisted even after excluding common variants.

Quick tour of Genetic Algorithms !

Jit — Thu, 17 Jan 2019 03:42:48 -0600

The R package GA provides a collection of general purpose functions for optimization using genetic algorithms. The package includes a flexible set of tools for implementing genetic algorithms search in both the continuous and discrete case, whether constrained or not. Users can easily define their own objective function depending on the problem at hand.

https://cran.r-project.org/web/packages/GA/vignettes/GA.html

Address of the bookmark: https://cran.r-project.org/web/packages/GA/vignettes/GA.html

kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity

Rahul Nayak — Tue, 29 May 2018 08:37:53 -0500

The k-mer Weighted Inner Product.

This software implements a de novo, alignment free measure of sample genetic dissimilarity which operates upon raw sequencing reads. It is able to calculate the genetic dissimilarity between samples without any reference genome, and without assembling one.

De novo estimates of genetic relatedness from next-gen sequencing data https://kwip.readthedocs.org

Address of the bookmark: https://github.com/kdmurray91/kwip

GEnView: A phylogeny based comparative genomics software to analyze the genetic environment of genes

Abhi — Tue, 28 Dec 2021 01:49:03 -0600

A phylogeny based comparative genomics software to analyze the genetic environment of genes. The user can select one or several taxa and provide one or several reference protein(s). Genomes and plasmids (based on user choice) will be downloaded from the NCBI Assembly/NR database and searched for the respective gene. Alternatively, custom genomes can be provided. User selected stretches (20kbp by default) of the genes genetic environment are extracted, annotated and aligned between all genomes. The sequences are then visualized, enabling comparison of synteny and gene content.

More at https://pubmed.ncbi.nlm.nih.gov/34951622/

Address of the bookmark: https://github.com/EbmeyerSt/GEnView

Public Databases for Bioinformatics !

Jit — Tue, 23 Mar 2021 05:32:15 -0500

https://www.nature.com/articles/s41467-020-17155-y

Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"

For use case 1 we obtained the following ENCODE and ROADMAP datasets https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz, https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam, https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam. Blacklisted regions were obtained from http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz. The human genome version hg38 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz.

For use case 2 we used the set of narrowPeak files summarized in https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt (archived version v1.0.1). The human genome version hg19 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

For use case 3 we used the ENCODE datasets https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam, https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig, https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam as we as the GENCODE annotation v29 from ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz.

Address of the bookmark: http://mitra.stanford.edu/

Ribbon: Visualizing complex genome alignments and structural variation:

Jit — Wed, 29 Nov 2017 07:40:22 -0600

Ribbon can be used for long reads, short reads, paired-end reads, and assembly/genome alignments. Instructions for each data format are available by clicking on "instructions" in each tab on the right.

Local installation:

You can install Ribbon locally from Github by following the instructions here: https://github.com/MariaNattestad/Ribbon

Address of the bookmark: http://genomeribbon.com/

jobTree based python wrapper to run the genome simulation tool suite Evolver

Jit — Fri, 08 Dec 2017 16:26:32 -0600

evolverSimControl (eSC) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (Newick format). In addition to simply running evolver, eSC also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (.maf) from leaf and branch nodes from a completed simulation and with the help of mafJoin, join them together into a single maf covering the entire simulation.

Address of the bookmark: https://github.com/dentearl/evolverSimControl