BOL: A guide for complete R beginners :- Getting data into R: Revision

A guide for complete R beginners :- Getting data into R: Revision

A guide for complete R beginners :- Getting data into R

Last updated 3715 days ago by Archana Malhotra

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x <- c(1,2,3,4,5);

same as

x <- c(1:5);

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata <- read.table(‘data.tsv’)

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

mydata$V1 mydata$V2 mydata$V3 #if any of these are typed it will print to screen

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata <- read.table(‘data.tsv’, header=TRUE) # header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

mydata$A mydata$B mydata$C #if any of these are typed it will print to screen

summary(mydata) # Summary about the whole data frame

summary(mydata$A) # Summary information of column A

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

summary(B) # summary of column B as ‘mydata’ is attached

Two other important options for read.table

If is is separated only by tabs and has a header

mydata <- read.table(‘data.tsv’, header=T, sep=’\t’)

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata <- read.table(‘data.tsv’, header=T, sep=’\t’, fill=TRUE)

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

<return> steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

attach(mydata) # if not already done so

boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

attach(mydata) # if not already done so

plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save

BOL

R and Bioconductor

History

A guide for complete R beginners :- Getting data into R

A guide for complete R beginners :- Getting data into R

Our Sponsors

A guide for complete R beginners :- Getting data into R: Revision

A guide for complete R beginners :- Getting data into R