For a beginner this can be is the hardest part, it is also the most important to get right.
It is possible to create a vector by typing data directly into R using the combine function ‘c’
x <- c(1,2,3,4,5);
same as
x <- c(1:5);
creates the vector x with the numbers between 1 and 5.
You can see what is in an object at any time by typing its name;
x
will produce the output ‘[1] 1 2 3 4 5′
Note that names need to be quoted
daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);
Usually however you want to input from a file. We have touched on the ‘read.table’ function already.
mydata <- read.table(‘data.tsv’)
Now mydata is a data frame with multiple vectors
each vector can be identified by the default syntax
mydata$V1 mydata$V2 mydata$V3 #if any of these are typed it will print to screen
By default the function assumes certain things from the file
If any of these are false, we need to tell that to the function
If it has a header column
mydata <- read.table(‘data.tsv’, header=TRUE) # header=T also works
Note that there is a comma between different parts of the functions arguments
If there is one less column in the header row, then R assumes that the 1st column of data after the header are the row names
Now the vectors (columns) are identified by their name
mydata$A mydata$B mydata$C #if any of these are typed it will print to screen
summary(mydata) # Summary about the whole data frame
summary(mydata$A) # Summary information of column A
We can shortcut having to type the data frame each time by attaching it
attach(mydata)
summary(B) # summary of column B as ‘mydata’ is attached
Two other important options for read.table
If is is separated only by tabs and has a header
mydata <- read.table(‘data.tsv’, header=T, sep=’\t’)
Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.
If you know that the file has uneven columns
mydata <- read.table(‘data.tsv’, header=T, sep=’\t’, fill=TRUE)
This causes R to fill empty spaces in a columns with ‘NA’ .
The last two examples will still work with our file and give the same result as with only headers=T
Graphs
to get an idea of what R is capable of type
demo(graphics)
<return> steps through the examples, and the code is printed to the screen
We will work with simpler examples that have immediate use to biologists.
Remember to get more information about the options to a function type ‘?function’
Histogram of A
hist(mydata$A)
If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).
boxplot(mydata)
We can get rid of the need to type the data frame each time by using the attach function
attach(mydata) # if not already done so
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)
same as
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)
Scatter plot
attach(mydata) # if not already done so
plot(A,B) # or plot(mydata$A, mydata$B)
SAVING an image
Windows users (Rgui) RIGHT click on image and select which you want.
These instructions work for everyone.
You need to create a new device of the type of file you need, then send the data to that device
to save as a png file (easy to load into the likes of powerpoint, also great for web applications.
png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)
or to save as a pdf
pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)
Note
To quit R type
q() # If you save your session, next time you start R, you will have your data preloaded.
Or if you want to remain in R
dev.off() #turns of the png (or pdf etc) device, thus forces the data to save