<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: A guide for complete R beginners :- Getting data into R]]></title>
	<link>https://bioinformaticsonline.com/pages/view/21443/a-guide-for-complete-r-beginners-getting-data-into-r?</link>
	<atom:link href="https://bioinformaticsonline.com/pages/view/21443/a-guide-for-complete-r-beginners-getting-data-into-r?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/21443/a-guide-for-complete-r-beginners-getting-data-into-r</guid>
	<pubDate>Tue, 24 Feb 2015 20:15:08 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/21443/a-guide-for-complete-r-beginners-getting-data-into-r</link>
	<title><![CDATA[A guide for complete R beginners :- Getting data into R]]></title>
	<description><![CDATA[<p>For a beginner this can be is the hardest part, it is also the most important to get right.</p><p>It is possible to create a vector by typing data directly into R using the combine function &lsquo;c&rsquo;</p><blockquote><p><strong>x </strong></p></blockquote><p>same as</p><blockquote><p><strong>x </strong></p></blockquote><p>creates the vector x with the numbers between 1 and 5.</p><p>You can see what is in an object at any time by typing its name;</p><blockquote><p><strong>x</strong></p></blockquote><p>will produce the output<strong> &lsquo;[1] 1 2 3 4 5&prime;</strong></p><p>Note that names need to be quoted</p><blockquote><p><strong>daysofweek </strong><strong>&larr; c(&lsquo;Monday&rsquo;, &lsquo;Tuesday&rsquo;, &lsquo;Wednesday&rsquo;, &lsquo;Thursday&rsquo;, &lsquo;Friday&rsquo;);</strong></p></blockquote><p>Usually however you want to input from a file. We have touched on the &lsquo;read.table&rsquo; function already.</p><blockquote><p><strong>mydata </strong></p></blockquote><p>Now <strong>mydata</strong> is a data frame with multiple vectors</p><p>each vector can be identified by the default syntax</p><p>#if any of these are typed it will print to screen</p><blockquote><p><strong>mydata$V1 mydata$V2 mydata$V3 </strong></p></blockquote><p>By default the function assumes certain things from the file</p><ul>
<li>The file is a plain text file (there are function to read excel files: <em>not covered here</em>)</li>
<li>columns are separated by any number of tabs or spaces</li>
<li>there is the same number of data points in each column</li>
<li>there is no header row (labels for the columns)</li>
<li>there is no column with names for the rows** [I&rsquo;ll explain].</li>
</ul><p><span style="text-decoration: underline;">If any of these are false, we need to tell that to the function</span></p><p>If it has a header column</p><blockquote><p><strong>mydata <em>header=T also works</em></strong></p></blockquote><p>Note that there is a comma between different parts of the functions arguments</p><p>If there is one less column in the header row, then R assumes that the 1<sup>st</sup> column of data after the header are the row names</p><p>Now the vectors (columns) are identified by their name</p><p>#if any of these are typed it will print to screen</p><blockquote><p><strong>mydata$A mydata$B mydata$C </strong></p></blockquote><p># Summary about the whole data frame</p><blockquote><p><strong>summary(mydata)</strong></p></blockquote><p># Summary information of column A</p><blockquote><p><strong>summary(mydata$A) </strong></p></blockquote><p>We can shortcut having to type the data frame each time by attaching it</p><blockquote><p><strong>attach(mydata)</strong></p></blockquote><p># summary of column B as &lsquo;mydata&rsquo; is attached</p><blockquote><p><strong>summary(B)</strong></p></blockquote><p><span style="text-decoration: underline;">Two other important options for </span><em><span style="text-decoration: underline;">read.table</span></em></p><p>If is is separated only by tabs and has a header</p><blockquote><p><strong>mydata </strong></p></blockquote><p>Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.</p><p>If you know that the file has uneven columns</p><blockquote><p><strong>mydata </strong></p></blockquote><p>This causes R to fill empty spaces in a columns with &lsquo;NA&rsquo; .</p><p>The last two examples will still work with our file and give the same result as with only headers=T</p><p><span style="text-decoration: underline;">Graphs</span></p><p>to get an idea of what R is capable of type</p><blockquote><p><strong>demo(graphics)</strong></p></blockquote><p>steps through the examples, and the code is printed to the screen</p><p>We will work with simpler examples that have immediate use to biologists.</p><p>Remember to get more information about the options to a function type &lsquo;?function&rsquo;</p><p><span style="text-decoration: underline;">Histogram of A</span><span style="text-decoration: underline;"></span></p><blockquote><p><strong>hist(mydata$A)</strong></p></blockquote><p>If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).</p><blockquote><p><strong>boxplot(mydata)</strong></p></blockquote><p>We can get rid of the need to type the data frame each time by using the <strong>attach</strong> function</p><p># if not already done so</p><blockquote><p><strong>attach(mydata) </strong></p><p><strong>boxplot(mydata$A, mydata$B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p>same as</p><blockquote><p><strong>boxplot(A, B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p><span style="text-decoration: underline;">Scatter plot</span></p><p># if not already done so</p><blockquote><p><strong>attach(mydata) </strong></p><p><strong>plot(A,B) # or plot(mydata$A, mydata$B)</strong></p></blockquote><p><strong><span style="text-decoration: underline;">SAVING an image</span></strong></p><p>Windows users (Rgui) RIGHT click on image and select which you want.</p><p><span style="text-decoration: underline;">These instructions work for everyone.</span></p><p>You need to create a new device of the type of file you need, then send the data to that device</p><p>to save as a png file (easy to load into the likes of powerpoint, also great for web applications.</p><blockquote><p><strong>png(&lsquo;filename&rsquo;) </strong></p><p><strong>boxplot(A, B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p>or to save as a pdf</p><blockquote><p><strong>pdf(&lsquo;filename&rsquo;) </strong></p><p><strong>boxplot(A, B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p><span style="text-decoration: underline;">Note</span></p><ul>
<li>Nothing will appear on screen, the output is going to the file</li>
<li>Also it may not be saved immediately but will once the device (or R) is turned quit.</li>
</ul><p>To quit R type</p><p><strong>q() # </strong>If you save your session, next time you start R, you will have your data preloaded.</p><p>Or if you want to remain in R</p><blockquote><pre><strong>dev.off() #</strong>turns of the png (or pdf etc) device, thus forces the data to save</pre></blockquote>]]></description>
	<dc:creator>Archana Malhotra</dc:creator>
</item>

</channel>
</rss>