BOL: Related items

A guide for complete R beginners :- Installing R packages

Archana Malhotra — Tue, 24 Feb 2015 20:23:34 -0600

Part of the reason R has become so popular is the vast array of packages available at the cran and bioconductor repositories. In the last few years, the number of packages has grown exponentially!

This is a short post giving steps on how to actually install R packages. Let’s suppose you want to install the ggplot2 package. Well nothing could be easier. We just fire up an R shell and type:
> install.packages("ggplot2")

In theory the package should just install, however:

if you are using Linux and don’t have root access, this command won’t work.
you will be asked to select your local mirror, i.e. which server should you use to download the package.

Installing packages without root access

First, you need to designate a directory where you will store the downloaded packages. On my machine, I use the directory /data/Rpackages/ After creating a package directory, to install a package we use the command:
> install.packages("ggplot2", lib="/data/Rpackages/") > library(ggplot2, lib.loc="/data/Rpackages/")

It’s a bit of a pain having to type /data/Rpackages/ all the time. To avoid this burden, we create a file .Renviron in our home area, and add the line R_LIBS=/data/Rpackages/ to it. This means that whenever you start R, the directory /data/Rpackages/ is added to the list of places to look for R packages and so:

> install.packages("ggplot2") > library(ggplot2)

just works!

Setting the repository

Every time you install a R package, you are asked which repository R should use. To set the repository and avoid having to specify this at every package install, simply:

create a file .Rprofile in your home area.
Add the following piece of code to it:

cat(".Rprofile: Setting UK repositoryn") r = getOption("repos") # hard code the UK repo for CRAN r["CRAN"] = "http://cran.uk.r-project.org" options(repos = r) rm(r)

I found this tip in a stackoverflow answer .

A guide for complete R beginners !

Archana Malhotra — Fri, 20 Feb 2015 23:36:46 -0600

This tutorial is intended to introduce users quickly to the basics of R, focusing on a few common tasks that biologists need to perform some basic analysis: load a table, plot some graphs, and perform some basic statistics. More extensive tutorials can be found on the project website and via bioconductor (not covered here).

R-language: http://www.r-project.org

BioConductor: http://www.bioconductor.org

Advantages of R

Free!
Powerful, many libraries have been created to perform application specific tasks. e.g. analysis of microarray experiments and Next-Gen sequencing (bioconductor: including Bioseq group).
Presentation quality graphics
- Save as a png, pdf or svg
History
- What you do can be saved for the next time you use R.
- Ability to turn it into an automated script to perform again and again on different data

Disadvantages

Lack of a comprehensive graphical user interface, but two do exist: However some do exist: R commander: http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ and Limma-gui (microarrays) : http://bioinf.wehi.edu.au/limmaGUI/

Preparation

(Optional) Download and save the tutorial data set from
- http://bioinformatics.knowledgeblog.org/wp-content/uploads/bioinf/kerr/data.tsv
- Start R (type R on a Linux or Mac terminal, or find the starting link from PC)

Getting More Help

Project Home page
- http://www.r-project.org/
- Check out the ‘introduction to R’, which is a much more in depth guide .
- Also R has a built-in help system (see later)

Working directory

This is the directory used to store your data and results. It is useful if it is also the directory where your input data is stored.

Mac/Linux: this is the directory where you typed in R
PC: Change using the change working directory option

OpenCPU

Rahul Nayak — Sun, 05 Jul 2015 18:34:46 -0500

OpenCPU is a system for embedded scientific computing and reproducible research. The OpenCPU server provides a reliable and interoperable HTTP API for data analysis based on R.

The OpenCPU JavaScript client library provides the most seamless integration of R and JavaScript available today.

OpenCPU uses standard R packaging to develop, ship and deploy web applications. Several open source example apps are available from Github.

Installing your own OpenCPU server is super easy and only takes a few minutes.

More at https://www.opencpu.org/

R for Microsoft Excel

Jitendra Narayan — Wed, 18 Feb 2015 00:43:27 -0600

If you currently use a spreadsheet like Microsoft Excel for data analysis, you might be interested in taking a look at this tutorial on how to transition from Excel to R by Tony Ojeda. The tutorial explains how to use R functions in place of Excel formulas, including tools like =AVERAGE and =VLOOKUP. For the most part, it uses modern R packages to keep the R code clear and concise.

You'll likely still be using Excel as a data source, though, so you'll also want to check out this guide to importing data from Excel to R from MilanoR.

Reference http://www.r-bloggers.com/an-r-tutorial-for-microsoft-excel-users/

A guide for complete R beginners :- R Syntax

Archana Malhotra — Fri, 20 Feb 2015 23:41:03 -0600

R is a functional based language, the inputs to a function, including options, are in brackets. Note that all dat and options are separated by a comma

Function(data, options)

Even quit is a function

So is help

help(read.table)

Provides the help page for the FUNCTION ‘read.table’

help.search(“t test”)

Searches for help pages that might relate to the phrase ‘t test’

NOTE: quotes are needed for search strings, they are not needed when referring to data objects or function names.

There is a short cut for help,

? shows the help page on a function name, same as help(function)

?read.table

?? searches for help pages on functions, same as help.search(‘phrase’)

??“t test”

Information is usually returned from a function, by default this is printed to screen

read.table(‘data.tsv’)

This can always be stored, we call what it is stored in an ‘object’

mydata

here mydata is an object of type dataframe

Reminder:

Vector: a list of numbers, equivalent to a column in a table
Data Frame = a collection of vectors. Equivalent to a table

Hint:

Up/Down arrow keys can be use to cycle through previous commands

A guide for complete R beginners :- Getting data into R

Archana Malhotra — Tue, 24 Feb 2015 20:15:08 -0600

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x

same as

x

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

x

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

#if any of these are typed it will print to screen

mydata$V1 mydata$V2 mydata$V3

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

#if any of these are typed it will print to screen

mydata$A mydata$B mydata$C

# Summary about the whole data frame

summary(mydata)

# Summary information of column A

summary(mydata$A)

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

# summary of column B as ‘mydata’ is attached

summary(B)

Two other important options for read.table

If is is separated only by tabs and has a header

mydata

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

# if not already done so

attach(mydata)
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

# if not already done so

attach(mydata)
plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save

BioScripts

Rahul Nayak — Sun, 28 Jun 2015 07:46:14 -0500

You are requested to please bookmark collection of bioinformatics tools, scripts, codes that can be pieced together in a very easy and flexible manner to perform both simple and complex bioinformatics tasks.

The next-generation sequencing included whole genome sequencing(WGS), transcriptome sequencing (whole cDNA sequencing, RNA-seq), digital gene expression sequencing (Tag-Seq), ChIP-Seq, and so on. And there are many sequencing platform to generate sequece, as well know Sanger/ABi(the frist generation), Solexa/illumina, SOLiD/ABi, 454/Roche. But thier sequence format is different, also they have different error type. High quality data is very important for further analysis or data mining. There are many pipeline for raw sequence quality analysis and control with few of process for reporting reads quality statistical details, trimming, filtering, and error correction. Please bookmarks them for the benefits of bioinformatics community.

https://code.google.com/p/biowiki/

https://code.google.com/p/ngs-pipeline/source/browse/#svn%2Ftrunk

NGSand Perl scripts https://code.google.com/hosting/search?q=NGS+perl&projectsearch=Search+projects

NGS and Python scripts https://code.google.com/hosting/search?q=NGS+Python&projectsearch=Search+projects

Address of the bookmark: https://code.google.com/hosting/search?q=bioinformatics&sa=Search

Venn Diagrams on R Studio

Jitendra Prajapati — Mon, 25 Apr 2016 16:22:28 -0500

First step: Install & load “VennDiagram” package.

# install.packages('VennDiagram')
library(VennDiagram)

Second step: Load data

Add filepath if “catdoge.csv” is not in working-directory.

d <- read.csv("catdoge.csv")

Address of the bookmark: http://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html

Affy has acquired Eureka Genomics for 15M $

Martin Jones — Wed, 20 May 2015 15:11:20 -0500

Affymetrix Acquires Assets Of Eureka Genomics Corporation To Provide High Throughput And Economical Crop And Animal Genotyping

http://www.thestreet.com/story/13151062/1/affymetrix-acquires-assets-of-eureka-genomics-corporation-to-provide-high-throughput-and-economical-crop-and-animal-genotyping.html

Pattern Matching Problem Solution with Perl

Jit — Tue, 09 Jun 2015 23:58:45 -0500

Problem at http://rosalind.info/problems/1c/

#Find all occurrences of a pattern in a string.
#Given: Strings Pattern and Genome.
#Return: All starting positions in Genome where Pattern appears as a substring. Use 0-based indexing.

use strict;
use warnings;

my $string="GATATATGCATATACTT";
my $subStr="ATAT";
my $kmer=length($subStr);

kmerMatch ($string, $subStr, $kmer);

sub kmerMatch { #Check the exact matching kmers with sliding window
my ($string, $myStr, $kmer)=@_;
for (my $aa=0; $aa<=(length($string)-$kmer); $aa++) {
    my $myWin=substr $string, $aa,$kmer;
    if ($myWin eq $myStr) {
        #print "$myWin eq $myStr\n";
        print $aa;
    }
}
}