BOL: Related items

CABOG: Celera Assembler with Best Overlap Graph

Abhimanyu Singh — Mon, 15 May 2017 05:04:39 -0500

CABOG (Celera Assembler with Best Overlap Graph) is scientific software for DNA research. CABOG has been a critical component of many genome sequencing projects. CABOG operates on small genomes such as bacterial as well as large genomes such as mammalian. CABOG is an extension of the Celera Assembler software that was originally developed at Celera for the 2001 publication of the first draft human genome sequence. The software was released to the public domain in 2004. Its open source repository on Source Forge is an internet resource for scientists around the world.

CABOG is one of many software programs called genome assemblers. These programs exist to overcome the fundamental limitation of all sequencing machines, namely, that they read out very few DNA letters at a time. These programs reconstruct genomes that are billions of letters long from the hundreds of letters per read that modern sequencers provide. What these programs do is often described as a scaled up version of a family solving a jigsaw puzzle.

The CABOG software was the first to accomplish many scientific goals. It was the first to assemble the genome of a multicellular organism (Drosophila melanogaster, 2000). It was the first to assemble both parental haplotypes of one human genome (J. Craig Venter, 2007). It was the first to assemble environmental sequence from the oceans (Sargasso Sea in 2004 and Global Ocean Sampling in 2007). It was first to combine reads from first-generation Sanger sequencing machines and second-generation pyrosequencing machines (Marine microbes, 2006). Today, CABOG is one of the leading assembly programs for data sets that include paired end data from the Roche 454 line of sequencing machines.

Address of the bookmark: http://www.jcvi.org/cms/research/projects/cabog/overview/

Vvek's Lab

Thu, 26 Sep 2013 11:11:39 -0500

Broad Area of Research: RNA biology (microRNA, lncRNA), Stem cells, Functional genomics, Epigenomics and Cancer

RNAs, especially non-coding RNAs (such as microRNA, long ncRNAs) are recently identified to be very abundant in mammalian organisms and play some key roles in gene expression regulation, gene silencing, and also implicated in disease progression, stem cell pluripotency etc. Current research activities of our lab include analysis of expression pattern of ncRNAs by microarray and next-gen sequencing data and understanding the role of miRNAs or other regulatory RNAs in various diseases, especially cancer and validation by reporter assays (renilla/luciferase) and other experimental tools.

More @ http://vvekslab.in/index.html

Useful Bioinformatics Analysis Tools !

Neel — Thu, 23 Dec 2021 23:10:02 -0600

CoMeta

Classificier of reads from metagenomic sequencing experiments.

• Kawulok, J., Deorowicz, S., CoMeta: Classification of Metagenomes Using k-mers, PLOS ONE, 2015; 10(4):1–23,

CoMSA

Compressor of multiple sequence alignments of proteins.

• Deorowicz, S., Walczyszyn, J., Debudaj-Grabysz, A., CoMSA: compression of protein multiple sequence alignment files, Bioinformatics, 2019; 35(2):22–234,

DSRC

Compressor of sequencing reads.

• Roguski, L., Deorowicz, S., DSRC 2: Industry-oriented compression of FASTQ files, Bioinformatics, 2014; 30(15):2213–2215,
• Deorowicz, S., Grabowski, Sz., Compression of DNA sequences in FASTQ format, Bioinformatics, 2011; 27(6):860–862,

FAMSA

Multiple sequence alignment designed for huge families of proteins (even containing hundreds of thousands of sequences).

• Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., FAMSA: Fast and accurate multiple sequence alignment of huge protein families, Scientific Reports, 2016; 6(33964):

FaStore

Compressor of FASTQ files.

• Roguski, L., Ochoa, I., Hernaez, M., Deorowicz, S., FaStore - a space-saving solution for raw sequencing data, Bioinformatics, 2018; 34(16):2748–2756,

FQSqueezer

Experimental high-end compressor of FASTQ files.

• Deorowicz, S., FQSqueezer: k-mer-based compression of sequencing data, Scientific Reports, 2020; 10(578):

GDC

Compressor of collections of genome sequences.

• Deorowicz, S., Danek, A., Niemiec, M., GDC 2: Compression of large collections of genomes, Scientific Reports, 2015; 5(11565):1–12,
• Deorowicz, S., Grabowski, Sz., Robust relative compression of genomes with random access, Bioinformatics, 2011; 27(21):2979–2986,

GTC

Genotype databases compressor with support for fast queries.

• Danek, A., Deorowicz, S., GTC: how to maintain huge genotype collections in a compressed form, Bioinformatics, 2018; 34(11):1834–1840,

GTShark

Genotypes compressor.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, Bioinformatics, 2019; 35(22):4791–4793,

KMC

Memory frugal k-mer counter.

•  Kokot, M., Długosz, M., Deorowicz, S., KMC 3: counting and manipulating k -mer statistics, Bioinformatics, 2017; 33(17):2759–2761,
•  Deorowicz, S., Kokot, M., Grabowski, Sz., Debudaj-Grabysz, A., KMC 2: Fast and resource-frugal k-mer counting, Bioinformatics, 2015; 31(10):1569–1576,
•  Deorowicz, S., Debudaj-Grabysz, A., Grabowski, Sz., Disk-based k-mer counting on a PC, BMC Bioinformatics, 2013; 14():Article no. 160,

Kmer-db

Tool for estimation of evolutionary distances in a collection of genomes.

• Deorowicz, S., Gudys, A., Dlugosz, M., Kokot, M., Danek, A., Kmer-db: instant evolutionary distance estimation, Bioinformatics, 2019; 35(1):133–136,

MuGI

Index allowing queries for a collection of multiple genome sequences.

• Danek, A., Deorowicz, S., Grabowski, Sz., Indexes of Large Genome Collections on a PC, PLOS ONE, 2014; 9(10):e109384,

ORCOM

Experimental compressor of sequencing reads.

• Grabowski, Sz., Deorowicz, S., Roguski, L., Disk-based compression of data from genome sequencing, Bioinformatics, 2014; 31(9):1389–1395,

PgSA

Index allowing queries for a collection of sequencing reads.

• Kowalski, T., Grabowski, Sz., Deorowicz, S., Indexing arbitrary-length k-mers in sequencing reads, PLOS ONE, 2015; 10(7):1–16,

QuickProbs

Multiple sequence alignment designed especially for GPU.

• Gudys, A., Deorowicz, S., QuickProbs 2: towards rapid construction of high-quality alignments of large protein families, Scientific Reports, 2017; 7(41553):
• Gudys, A., Deorowicz, S., QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors, PLOS ONE, 2014; 9(2):e88901,

RECKONER

Read error corrector.

• Maciej Długosz, M., Deorowicz, S., RECKONER: read error corrector based on KMC, Bioinformatics, 2017; 33(7):1086–1089,

TGC

Compressor of collections of genomes given in Variant Call Format (VCF) files.

• Deorowicz, S., Danek, A., Grabowski, Sz., Genome compression: a novel approach for large collections, Bioinformatics, 2013; 29(20):2572–2578,

VCFShark

Compressor of VCF files.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, biorxiv.org, 2020; ():

Whisper

Experimental mapper of whole genome sequencing data.

•  Deorowicz, S., Gudys, A., Whisper 2: indel-sensitive short read mapping, bioRxiv.org, 2019; :
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Whisper: read sorting allows robust robust mapping of DNA sequencing data, Bioinformatics, 2019; 35(12):2043–2050,
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Robust mapping of whole genome sequencing data, Poster at The Biology of Genomes Conference, 2017;

The Graveley Lab

Tue, 19 Nov 2013 18:02:48 -0600

Research in the Graveley lab is primarily focused on the regulation of alternative splicing and small RNA mediated gene regulation. These are fascinating and extraordinarily important mechanisms by which genes can be regulated. Our long-term goals are to understand how these processes are regulated at a mechanistic level and to understand the logic of these processes in significant biological settings. To achieve these goals, we strive to think outside the box to creatively attack the problems being addressed using a wide variety of approaches that include biochemistry, genetics, imaging, deep sequencing, large-scale RNAi screening and bioinformatics.

Lab page @ http://graveleylab.cam.uchc.edu/Graveley/index.html

Post-doctoral Research Assistant in Genetics

Thu, 05 Jun 2014 16:01:39 -0500

Post-doctoral Research Assistant in Genetics
Camden, North London
£31.1K per annum inclusive of London Weighting

This is a fixed term post for 36 months.

We wish to recruit a highly motivated, postdoctoral scientist to carry out a BBSRC funded project in the laboratory of Dr. Denis Larkin. The project is focused on developing and applying new algorithms to study genome and chromosome evolution in birds, mammals and other vertebrate species using whole-genome sequences and existing algorithms. The post holder will use cutting edge computational and laboratory approaches to generate chromosomal assemblies for sequenced genomes, study chromosomal structures and genome differences between bird and other vertebrate species in attempt to identify species- and clade-specific genome signatures.

Applicants must have a Ph.D. and a track record of success, as indicated by first-author publications in international journals. They must possess excellent organisation skills and be capable of individual initiative and of interacting as part of a team. Applicants with extensive practical experience in bioinformatics or computer science, programming, visualization, handling of large data sets, high-performance computing are encouraged to apply. The post will involve collaboration with a wide range of academic partners both within the UK, EU and worldwide. In addition to leading their own project the post holder will have opportunities to contribute to multiple international genome initiatives.

Experience in programming, bioinformatics and comparative genome analysis is essential. Applicants should have a minimum of a degree and preferably a higher degree in a relevant subject.

The Royal Veterinary College has the largest range of veterinary, para-veterinary and animal science undergraduate and postgraduate courses of any veterinary school in the world and is one of the largest veterinary schools in Europe.

Prospective applicants are encouraged to contact Dr. Denis Larkin, Comparative Biomedical Sciences Department on +442071211906 or email: dlarkin@rvc.ac.uk

We offer a generous reward package.

For further information and to apply on-line please visit our website: www.rvc.ac.uk
Job reference CBS-0025-14A

Closing date: 4 July 2014
Interviews are likely to be held in July 2014

We promote equality of opportunity and diversity within the workplace and welcome applications from all sections of the community.

Roth Lab

Tue, 11 Mar 2014 17:43:45 -0500

The Roth Lab seeks insight into biological systems through genome- and proteome-scale experimentation and analysis.

Current computational interests:

Systematic analysis of genetic epistasis to identify redundant or compensatory systems and to reveal order of action in genetic pathways.
Using knockout, knockdown, or overexpression, or other perturbation experiments in combinations of genes in S. cerevisiae, C. elegans or mouse.
Using genome-scale genotyping of natural polymorphisms in S. cerevisiae and human populations.
Alternative splicing and its relationship to protein interaction networks.
Integrating large-scale studies including phenotype, genetic epistasis, protein-protein and transcription-regulatory interactions and sequence patterns to quantitatively assign function to genes and guide experimentation.

More at http://llama.mshri.on.ca/index.html

Amity University Bioinformatics Summer Program - Kolkata

eliabrodsky — Tue, 11 Jun 2019 21:27:10 -0500

Registrations are now open for the 2019 Summer Bioinformatics Training program at Amity University, Kolkata. The program will focus on introductory topics for life science students. We will review important history, topics and challenges bioinformatics can help address in the context of basic research, discovery and industry.

Read more: https://edu.t-bio.info/amity-university-summer-bioinformatics-program-registrations-are-open/

Perl one-liner for bioinformatician !!!

Abhimanyu Singh — Fri, 30 May 2014 05:49:07 -0500

With the emergence of NGS technologies, and sequencing data most of the bioinformaticians mung and wrangle around massive amounts of genomics text. There are several "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Perl onliner is extremely helpful.

Perl one-liners are small and awesome Perl programs that fit in a single line of code and they do one thing really well. These things include changing line spacing, numbering lines, doing calculations, converting and substituting text, deleting and printing certain lines, parsing logs, editing files in-place, doing statistics, carrying out system administration tasks, updating a bunch of files at once, and many more. Perl one-liners will make you the shell warrior. Anything that took you minutes to solve, will now take you seconds!

perl -pe '$\="\n"'
#double space a file

perl -pe '$_ .= "\n" unless /^$/'
#double space a file except blank lines

perl -pe '$_.="\n"x7'
#7 space in a line.

perl -ne 'print unless /^$/'
#remove all blank lines

perl -lne 'print if length($_) < 20'
#print all lines with length less than 20.

perl -00 -pe ''
#If there are multiple spaces, delete all leaving one(make the file a single spaced file).

perl -00 -pe '$_.="\n"x4'
#Expand single blank lines into 4 consecutive blank lines

perl -pe '$_ = "$. $_"'
#Number all lines in a file

perl -pe '$_ = ++$a." $_" if /./'
#Number only non-empty lines in a file

perl -ne 'print ++$a." $_" if /./'
#Number and print only non-empty lines in a file

perl -pe '$_ = ++$a." $_" if /regex/'
#Number only lines that match a pattern

perl -ne 'print ++$a." $_" if /regex/'
#Number and print only lines that match a pattern

perl -ne 'printf "%-5d %s", $., $_ if /regex/'
#Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)

perl -le 'print scalar(grep{/./}<>)'
#prints the total number of non-empty lines in a file

perl -lne '$a++ if /regex/; END {print $a+0}'
#print the total number of lines that matches the pattern

perl -alne 'print scalar @F'
#print the total number fields(words) in each line.

perl -alne '$t += @F; END { print $t}'
#Find total number of words in the file

perl -alne 'map { /regex/ && $t++ } @F; END { print $t }'
#find total number of fields that match the pattern

perl -lne '/regex/ && $t++; END { print $t }'
#Find total number of lines that match a pattern

perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m'
#will calculate the GCD of two numbers.

perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m'
#will calculate lcd of 20 and 35.

perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n'
#Generates 10 random numbers between 5 and 15.

perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8'
#Generates a 8 character password from a to z and number 0 – 9.

perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20'
#Generates a 20 nucleotide long random residue.

perl -le 'print "a"x50'
#generate a string of ‘x’ 50 character long

perl -le 'print join ", ", map { ord } split //, "hello world"'
#Will print the ascii value of the string hello world.

perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)'
#converts ascii values into character strings.

perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"'
#Generates an array of odd numbers.

perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"'
#Generate an array of even numbers

perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file
#Convert the entire file into 13 characters offset(ROT13)

perl -nle 'print uc'
#Convert all text to uppercase:

perl -nle 'print lc'
#Convert text to lowercase:

perl -nle 'print ucfirst lc'
#Convert only first letter of first word to uppercas

perl -ple 'y/A-Za-z/a-zA-Z/'
#Convert upper case to lower case and vice versa

perl -ple 's/(\w+)/\u$1/g'
#Camel Casing

perl -pe 's|\n|\r\n|'
#Convert unix new lines into DOS new lines:

perl -pe 's|\r\n|\n|'
#Convert DOS newlines into unix new line

perl -pe 's|\n|\r|'
#Convert unix newlines into MAC newlines:

perl -pe '/regexp/ && s/foo/bar/'
#Substitute a foo with a bar in a line with a regexp.

Reference/Sources:

http://genomics-array.blogspot.in/2010/11/some-unixperl-oneliners-for.html

http://genomespot.blogspot.com/2013/08/a-selection-of-useful-bash-one-liners.html

http://biowize.wordpress.com/2012/06/15/command-line-magic-for-your-gene-annotations/

http://genomics-array.blogspot.com/2010/11/some-unixperl-oneliners-for.html

http://bioexpressblog.wordpress.com/2013/04/05/split-multi-fasta-sequence-file/

JBrowse: Embeddable genome browser built completely with JavaScript and HTML5

Jit — Fri, 29 Jun 2018 09:19:56 -0500

JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5, with optional run-once data formatting tools written in Perl. Headline Features: Fast, smooth scrolling and zooming. Explore your genome with unparalleled speed. Scales easily to multi-gigabase genomes and deep-coverage sequencing. Quickly open and view data files on your computer without uploading them to any server. Supports GFF3, BED, FASTA, Wiggle, BigWig, BAM, VCF (with either .tbi or .idx index), REST, and more. BAM, BigBed, BigWig, and VCF data are displayed directly from chunks of the compressed binary files, no conversion needed. Includes an optional “faceted” track selector (see demo) suitable for large installations with thousands of tracks. Very light server resource requirements. In fact, JBrowse has no back-end server code, just tools for formatting data files to be read directly over HTTP. Serve huge datasets from a single low-cost cloud instance. Can run as a stand-alone app on OSX and Windows using the Electron platform Highly extensible plugin architecture, with a large plugin registry of existing examples here https://gmod.github.io/jbrowse-registry https://jbrowse.org/

Address of the bookmark: https://github.com/GMOD/jbrowse

The 8000 years old Tibetian gene mutation !!!

Neel — Wed, 20 Aug 2014 21:57:44 -0500

A new study has provided insight into how gene mutation around 8,000 years ago helped Tibetans' to survive in the thin air on the Tibetan Plateau, where an average elevation is of 14,800 feet.

A study led by University of Utah scientists is the first to find a genetic cause for the adaptation, a single DNA base pair change that dates back 8,000 years and demonstrate how it contributes to the Tibetans' ability to live in low oxygen conditions.

About 8,000 years ago, the gene EGLN1 changed by a single DNA base pair. Today, a relatively short time later on the scale of human history, 88 percent of Tibetans have the genetic variation, and it was virtually absent from closely related lowland Asians. The findings indicate the genetic variation endows its carriers with an advantage.

In those without the adaptation, low oxygen caused their blood to become thick with oxygen-carrying red blood cells, an attempt to feed starved tissues, which could cause long-term complications such as heart failure. The researchers found that the newly identified genetic variation protected Tibetans by decreasing the over-response to low oxygen.

Reference: http://www.nature.com/nature/journal/v512/n7513/abs/nature13408.html