BOL: Related items

Pacman

Rahul Nayak — Mon, 16 Feb 2015 12:15:17 -0600

The pacman package is an R package management tool that combines the functionality of base library related functions into intuitively named functions. This package is ideally added to .Rprofile to increase workflow by reducing time recalling obscurely named functions, reducing code and integrating functionality of base functions to simultaneously perform multiple actions.

Function names in the pacman package follow the format of p_xxx where ‘xxx’ is the task the function performs. For instance the p_load function allows the user to load one or more packages as a more generic substitute for the library or require functions and if the package isn’t available locally it will install it for you.

Installation

To download the development version of pacman:

Download the zip ball or tar ball, decompress and run R CMD INSTALL on it, or use the devtools package to install the development version:

## Make sure your current packages are up to date
update.packages()
## devtools is required
devtools::install_github("trinker/pacman")

Note: Windows users need Rtools and devtools to install this way.

More at https://github.com/trinker/pacman

A guide for complete R beginners :- Getting data into R

Archana Malhotra — Tue, 24 Feb 2015 20:15:08 -0600

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x

same as

x

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

x

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

#if any of these are typed it will print to screen

mydata$V1 mydata$V2 mydata$V3

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

#if any of these are typed it will print to screen

mydata$A mydata$B mydata$C

# Summary about the whole data frame

summary(mydata)

# Summary information of column A

summary(mydata$A)

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

# summary of column B as ‘mydata’ is attached

summary(B)

Two other important options for read.table

If is is separated only by tabs and has a header

mydata

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

# if not already done so

attach(mydata)
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

# if not already done so

attach(mydata)
plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save

OpenCPU

Rahul Nayak — Sun, 05 Jul 2015 18:34:46 -0500

OpenCPU is a system for embedded scientific computing and reproducible research. The OpenCPU server provides a reliable and interoperable HTTP API for data analysis based on R.

The OpenCPU JavaScript client library provides the most seamless integration of R and JavaScript available today.

OpenCPU uses standard R packaging to develop, ship and deploy web applications. Several open source example apps are available from Github.

Installing your own OpenCPU server is super easy and only takes a few minutes.

More at https://www.opencpu.org/

RATT

Jitendra Narayan — Sun, 07 Feb 2016 16:09:40 -0600

RATT is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.

It was first developed to transfer annotations between different genome assembly versions. However, it can also transfer annotations between strains and even different species, like Plasmodium chabaudi onto P. berghei, between different Leishmania species or Salmonella enterica onto other Salmonella serotypes. RATT is able to transfer any entries present on a reference sequence, such as the systematic id or an annotator's notes; such information would be lost in a de novo annotation.

More at http://ratt.sourceforge.net/

Address of the bookmark: http://ratt.sourceforge.net/

Pilon

Rahul Nayak — Mon, 08 Feb 2016 15:56:18 -0600

Pilon is a software tool which can be used to:

Automatically improve draft assemblies
Find variation among strains, including large event detection

Pilon requires as input a FASTA file of the genome along with one or more BAM files of reads aligned to the input FASTA file. Pilon uses read alignment analysis to identify inconsistencies between the input genome and the evidence in the reads. It then attempts to make improvements to the input genome, including:

Single base differences
Small indels
Larger indel or block substitution events
Gap filling
Identification of local misassemblies, including optional opening of new gaps

More at https://github.com/broadinstitute/pilon/wiki

Address of the bookmark: https://github.com/broadinstitute/pilon/wiki

RCircos: an R package for Circos 2D track plots

Jit — Fri, 20 May 2016 11:01:13 -0500

RCircos package provides a simple and flexible way to make Circos 2D track plots with R and could be easily integrated into other R data processing and graphic manipulation pipelines for presenting large-scale multi-sample genomic research data. It can also serve as a base tool to generate complex Circos images.

More at https://bitbucket.org/henryhzhang/rcircos/src

Address of the bookmark: https://bitbucket.org/henryhzhang/rcircos/src

vcfR

Archana Malhotra — Fri, 19 Aug 2016 07:38:24 -0500

Most variant calling pipelines result in files containing large quantities of variant information. The variant call format (vcf) is an increasingly popular format for this data. The format of these files and their content is discussed in the vignette ‘vcf data.’ These files are typically intended to be post-processed (i.e., filtered) as an attempt to remove false positives or otherwise problematic sites. The R package vcfR provides tools to facilitate this filtering as well as to visualize the effects of choices made during this process.

Address of the bookmark: https://cran.r-project.org/web/packages/vcfR/vignettes/visualization_1.html

Jvarkit : Java utilities for Bioinformatics

Jit — Fri, 08 Jun 2018 09:31:55 -0500

Collection of Java tool kits for bioinformatics works: Jvarkit : Java utilities for Bioinformatics

Address of the bookmark: http://lindenb.github.io/jvarkit/

GABi

Fri, 06 Dec 2013 16:43:01 -0600

GABi Research
The major researching fields defined as the GABi scope are described next:
Sequence Analysis
Protein Structure Prediction
Comparative Genomics
Functional Analysis of Residues on Protein Families
Gene/Protein Networks
Genome structure & base composition
Highthroughput data analysis from NGS

Lab Page http://gabi.cidbio.org/index/

How to sequence the human genome - Mark J. Kiel

Fri, 30 May 2014 13:24:11 -0500

View full lesson: http://ed.ted.com/lessons/how-to-sequence-the-human-genome-mark-j-kiel Your genome, every human's genome, consists of a unique DNA sequence of A's, T's, C's and G's that tell your cells how to operate. Thanks to technological advances, scientists are now able to know the sequence of letters that makes up an individual genome relatively quickly and inexpensively. Mark J. Kiel takes an in-depth look at the science behind the sequence. Lesson by Mark J. Kiel, animation by Marc Christoforidis.