BOL: Related items

WuXi has acquired NextCODE Health

Pranjali Yadav — Mon, 19 Jan 2015 08:17:35 -0600

Shanghai, China-headquartered pharmatech company WuXi (NYSE: WX) has acquired NextCODE Health, a genomic analysis and bioinformatics company based in the USA.

The acquisition was made for $65 million in cash, and WuXi plans to merge its genome center with NextCODE Health to form a new company, WuXi NextCODE Genomics. The business will be headquartered in Shanghai and have operations in Cambridge, Massachusetts, and Reykjavik, Iceland.

With the huge unmet medical needs in diseases with a genetic component and the rapid advances in genomics and bioinformatics, now is the right time for WuXi to make a strategic investment in this field, and NextCODE is the right partner. This new venture of WuXi NextCODE Genomics will create important new genomic and bioinformatic products and services to help make personalized treatment and medicine a reality. It will also enable doctors to provide better treatments to patients.

A guide for complete R beginners :- Getting data into R

Archana Malhotra — Tue, 24 Feb 2015 20:15:08 -0600

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x

same as

x

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

x

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

#if any of these are typed it will print to screen

mydata$V1 mydata$V2 mydata$V3

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

#if any of these are typed it will print to screen

mydata$A mydata$B mydata$C

# Summary about the whole data frame

summary(mydata)

# Summary information of column A

summary(mydata$A)

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

# summary of column B as ‘mydata’ is attached

summary(B)

Two other important options for read.table

If is is separated only by tabs and has a header

mydata

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

# if not already done so

attach(mydata)
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

# if not already done so

attach(mydata)
plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save

Narcis Fernandez-Fuentes Lab

Mon, 25 May 2015 07:30:00 -0500

Welcome to our web-site compiling all the research-related activities of the group. Our research interests relate to a number of areas within Bioinformatics. We have a long-standing interest in protein structure prediction and structure-to-function relationships. We work in the study of biomolecular interactions, modeling of protein complexes, the study and characterization of protein-protein interactions, peptide design, modeling of genetic variation, structure-based protein design and different aspects of Plant Bioinformatics. Take a look at the our databases and servers and the list of publications for more information.

More at http://www.bioinsilico.org/

OpenCPU

Rahul Nayak — Sun, 05 Jul 2015 18:34:46 -0500

OpenCPU is a system for embedded scientific computing and reproducible research. The OpenCPU server provides a reliable and interoperable HTTP API for data analysis based on R.

The OpenCPU JavaScript client library provides the most seamless integration of R and JavaScript available today.

OpenCPU uses standard R packaging to develop, ship and deploy web applications. Several open source example apps are available from Github.

Installing your own OpenCPU server is super easy and only takes a few minutes.

More at https://www.opencpu.org/

CrossMap

Jitendra Narayan — Mon, 08 Feb 2016 15:47:00 -0600

CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) <> hg19 (GRCh37), Mouse mm9 (MGSCv37) <> mm10 (GRCm38)).

It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.

CrossMap is designed to liftover genome coordinates between assemblies. It’s not a program for aligning sequences to reference genome.

We do not recommend using CrossMap to convert genome coordinates between species.

More at http://crossmap.sourceforge.net/

Address of the bookmark: http://crossmap.sourceforge.net/

The Mills lab

Wed, 24 Feb 2016 16:18:38 -0600

The laboratory is focused on the discovery and analysis of structural variation (SVs) from genomic sequence data. As part of the 1000 Genomes Project and other endeavors, we have helped produce initial fine-scale maps using a variety of SV discovery approaches including: (i) paired-end mapping (or read pair analysis) based on abnormally mapped pairs of clone ends; (ii) read-depth analysis, which detects deletions and duplications through analysis of the read depth-of-coverage; (iii) split read analysis, which detects SVs by evaluating gapped sequence alignments; and (iv) sequence assembly, which enables the discovery of novel (non-reference) sequence insertions.

http://millslab.org/research.html

SATSUMA : Highly sensitive whole-genome synteny alignments.

Jit — Fri, 13 May 2016 05:25:26 -0500

Satsuma is a whole-genome synteny alignment program. It takes two genomes, computes alignments, and then keeps only the parts that are orthologous, i.e. following the conserved order and orientation of features, such as protein coding genes, non-coding genes, or neutral sequences. Satsuma does not require any pre-processing, such as repeat masking, since it will automatically detect ambiguous mappings.

Satsuma has parallelization built-in and is designed to run on multi-core architectures. The run-time for aligning two bird-size genomes (~1.2 Gb) is around two days on 24 CPUs.

You can find the manual here.
Download the latest source code from here.
Stable versions can also be downloaded from the Broad Institute's web site.

An incomplete list of questions and answers (yes, these have really been asked by our users! Please feel free to add your own by e-mailing us) is here.

If you use Satsuma in your research, please cite:
Grabherr, M. G., Russell, P., Meyer, M., Mauceli, E., Alföldi, J., Di Palma, F., & Lindblad-Toh, K. (2010). Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics, 26(9), 1145-51.

Tutorial at http://evomics.org/learning/genomics/satsuma/

Address of the bookmark: http://satsuma.sourceforge.net/

The Kingsley Lab

Fri, 03 Jun 2016 09:55:10 -0500

The Molecular Basis of Vertebrate Evolution. Naturally occurring species show spectacular differences in morphology, physiology, behavior, disease susceptibility, and life span. Although the genomes of many organisms have now been completely sequenced, Kingsley lab still know relatively little about the specific DNA sequence changes that underlie interesting species-specific traits. Kingsley lab laboratory is using a combination of genetic and genomic approaches to identify the detailed molecular mechanisms that control evolutionary change in vertebrates.

Samtools Primer !!

Jit — Thu, 23 Jun 2016 07:18:17 -0500

SAMtools: Primer / Tutorial by Ethan Cerami, Ph.D.

keywords: samtools, next-gen, next-generation, sequencing, bowtie, sam, bam, primer, tutorial, how-to, introduction
Revisions

    1.0: May 30, 2013: First public release on biobits.org.
    1.1: July 24, 2013: Updated with Disqus Comments / Feedback section.
    1.2: December 19, 2014: Multiple updates, including:
        Updated to use samtools 1.1 and bcftools 1.2.
        Updated usage for bcftools.

About

SAMtools is a popular open-source tool used in next-generation sequence analysis. This primer provides an introduction to SAMtools, and is geared towards those new to next-generation sequence analysis. The primer is also designed to be self-contained and hands-on, meaning that you only need to install SAMtools, and no other tools, and sample data sets are provided. Terms in bold are also explained in the glossary at the end of the document.

Address of the bookmark: http://biobits.org/samtools_primer.html

Gene Finding and Predictions

Poonam Mahapatra — Fri, 26 Aug 2016 07:26:27 -0500

In this exercise, a previously annotated gene will be used to measure the accuracy of different gene finding approaches. GRAIL, GENSCAN, geneid, FGENESH, GenomeScan, GrailEXP and GENEWISE will be used to annotate the sequence. Both search by signal, content and homology (protein and cDNA sequences) methods will be employed in order to improve the ab initio results. Weak conservation of Start codons will lead to wrong prediction of initial exons in most cases.

http://genome.crg.es/courses/Bioinformatics2003_genefinding/

Address of the bookmark: http://genome.crg.es/courses/Bioinformatics2003_genefinding/