BOL: Related items

Trelliscope: flexibly visualize large, complex data in great detail from within the R statistical programming environment.

Jit — Tue, 21 Jan 2020 04:22:49 -0600

Trelliscope provides a way to flexibly visualize large, complex data in great detail from within the R statistical programming environment. Trelliscope is a component in the DeltaRho environment.

For those familiar with Trellis Display, faceting in ggplot, or the notion of small multiples, Trelliscope provides a scalable way to break a set of data into pieces, apply a plot method to each piece, and then arrange those plots in a grid and interactively sort, filter, and query panels of the display based on metrics of interest. With Trelliscope, we are able to create multipanel displays on data with a very large number of subsets and view them in an interactive and meaningful way.

Address of the bookmark: http://deltarho.org/docs-trelliscope/#introduction

Powerful books for learning data analysis with R

LEGE — Tue, 28 May 2024 07:42:56 -0500

R is powerful tool for data analysis, visualization, and machine learning. And it costs $0 to use! Here are six FREE books you can use to learn R today:

https://csgillespie.github.io/efficientR/

https://r-graphics.org/

https://rstudio-education.github.io/hopr/

https://r-pkgs.org/

https://r4ds.had.co.nz/

Address of the bookmark: https://r-graphics.org/

Import R Data

Abhimanyu Singh — Wed, 12 Jul 2017 08:30:46 -0500

It is often necessary to import sample textbook data into R before you start working on your homework.

Excel File

Quite frequently, the sample data is in Excel format, and needs to be imported into R prior to use. For this, we can use the function read.xls from the gdata package. It reads from an Excel spreadsheet and returns a data frame. The following shows how to load an Excel spreadsheet named "mydata.xls". This method requires Perl runtime to be present in the system.

> library(gdata)                   # load gdata package
> help(read.xls)                   # documentation
> mydata = read.xls("mydata.xls")  # read from first sheet

Alternatively, we can use the function loadWorkbook from the XLConnect package to read the entire workbook, and then load the worksheets with readWorksheet. The XLConnect package requires Java to be pre-installed.

> library(XLConnect) # load XLConnect package
> wk = loadWorkbook("mydata.xls")
> df = readWorksheet(wk, sheet="Sheet1")

Minitab File

If the data file is in Minitab Portable Worksheet format, it can be opened with the function read.mtp from the foreign package. It returns a list of components in the Minitab worksheet.

> library(foreign)                 # load the foreign package
> help(read.mtp)                   # documentation
> mydata = read.mtp("mydata.mtp")  # read from .mtp file

SPSS File

For the data files in SPSS format, it can be opened with the function read.spss also from the foreign package. There is a "to.data.frame" option for choosing whether a data frame is to be returned. By default, it returns a list of components instead.

> library(foreign) # load the foreign package
> help(read.spss) # documentation
> mydata = read.spss("myfile", to.data.frame=TRUE)

Table File

A data table can resides in a text file. The cells inside the table are separated by blank characters. Here is an example of a table with 4 rows and 3 columns.

100   a1   b1
200   a2   b2
300   a3   b3
400   a4   b4

Now copy and paste the table above in a file named "mydata.txt" with a text editor. Then load the data into the workspace with the function read.table.

> mydata = read.table("mydata.txt")  # read text file
> mydata                             # print data frame
   V1 V2 V3
1 100 a1 b1
2 200 a2 b2
3 300 a3 b3
4 400 a4 b4

For further detail of the function read.table, please consult the R documentation.

> help(read.table)

CSV File

The sample data can also be in comma separated values (CSV) format. Each cell inside such data file is separated by a special character, which usually is a comma, although other characters can be used as well.

The first row of the data file should contain the column names instead of the actual data. Here is a sample of the expected format.

Col1,Col2,Col3
100,a1,b1
200,a2,b2
300,a3,b3

After we copy and paste the data above in a file named "mydata.csv" with a text editor, we can read the data with the function read.csv.

> mydata = read.csv("mydata.csv")  # read csv file
> mydata
  Col1 Col2 Col3
1  100   a1   b1
2  200   a2   b2
3  300   a3   b3

In various European locales, as the comma character serves as the decimal point, the function read.csv2 should be used instead. For further detail of the read.csv and read.csv2 functions, please consult the R documentation.

> help(read.csv)

Working Directory

Finally, the code samples above assume the data files are located in the R working directory, which can be found with the function getwd.

> getwd() # get current working directory

You can select a different working directory with the function setwd(), and thus avoid entering the full path of the data files.

> setwd("") # set working directory

Note that the forward slash should be used as the path separator even on Windows platform.

> setwd("C:/MyDoc")

Which of the following programming language is best for a bioinformatics beginner?

Manisha Mishra — Thu, 04 Sep 2014 07:51:16 -0500

I will be doing NGS in the course of my research work and I will like to learn a programming language which is compatible with most bioinformatics tools or software. I basically want to do de-novo assembly, map reads, align reads, and expression analysis. Recommendations welcomed. Which languages would you recommend to a student wishing to enter the world of bioinformatics?

The Raku Programming Language

Jit — Tue, 28 Jan 2020 05:37:17 -0600

Raku is a member of the Perl family of programming languages. Formerly known as Perl 6, it was renamed in October 2019. Raku introduces elements of many modern and historical languages. Compatibility with Perl was not a goal, though a compatibility mode is part of the specification.

More at https://www.raku.org/

Address of the bookmark: https://www.raku.org/

Type Hinting

Pranjali Yadav — Fri, 09 Jan 2015 22:26:13 -0600

Python creator Guido van Rossum’s proposal for static type-checking annotations is inching closer to reality, and the feature has taken on a new name: type hinting.

Back in August, van Rossum published a proposal on the Python mailing list recommending type-checking annotations as a valuable feature for the next version of Python to improve the performance of editors and IDEs, linter capabilities, standard notation, and refactoring. Van Rossum’s latest proposal, posted late last month, outlined plans to publish a Python Enhancement Proposal (PEP) in early January to put the feature now known as type hinting on track for inclusion in Python 3.5, slated for release this September.

Reference

https://quip.com/r69HA9GhGa7J

Learning Python Programming - a bioinformatician perspective !

Rahul Nayak — Mon, 14 May 2018 16:33:03 -0500

Python Programming is a general purpose programming language that is open source, flexible, powerful and easy to use. One of the most important features of python is its rich set of utilities and libraries for data processing and analytics tasks. In the current era of big biological data, python and biopython is getting more popularity due to its easy-to-use features which supports big data processing.

In this tutorial series article, I will explore features and packages of python which are widely used in the big data, NGS, and bioinformatics. I will also walk through a real biological example which shows NGS data processing with the help of python packages and programming.

Python has a couple of points to recommend it to biologists and scientists specifically:

It's widely used in the scientific community
It has a couple of very well designed libraries for doing complex scientific computing (although we won't encounter them in this book)
It lend itself well to being integrated with other, existing tools
It has features which make it easy to manipulate strings of characters (for example, strings of DNA bases and protein amino acid residues, which we as biologists are particularly fond of)

In general, following are some of the important features of python which makes it a perfect fit for rapid application development.

Python is interpreted language so the program does not need to be compiled. Interpreter parses the program code and generates the output.
Python is dynamically typed, so the variables types are defined automatically.
Python is strongly typed. So the developers need to cast the type manually.
Less code and more use makes it more acceptable.
Python is portable, extendable and scalable.

There are two major Python versions, Python 2 and Python 3. Python 2 and 3 are quite different. This tutorial uses Python 3, because it more semantically correct and supports newer features.

I will post tutorial on daily basis on this page. Check the sub-pages on right side.

Hello Python World !

Rahul Nayak — Mon, 14 May 2018 16:41:01 -0500

As I mentioned earlier, I will keep on posting one Python script per day to introduce you to Python programming. Whether you are an experienced programmer or not, this tutorial is intended for everyone who wishes to learn the Python programming language.

Python is a very simple language, and has a very straightforward syntax. The simplest directive in Python is the "print" directive - it simply prints out a line (and also includes a newline).

Create a file Hello.py

print("Hello, Python World !.")

Run

python3 Hello.py

Venn Diagrams on R Studio

Jitendra Prajapati — Mon, 25 Apr 2016 16:22:28 -0500

First step: Install & load “VennDiagram” package.

# install.packages('VennDiagram')
library(VennDiagram)

Second step: Load data

Add filepath if “catdoge.csv” is not in working-directory.

d <- read.csv("catdoge.csv")

Address of the bookmark: http://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html

A guide for complete R beginners :- Getting data into R

Archana Malhotra — Tue, 24 Feb 2015 20:15:08 -0600

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x

same as

x

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

x

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

#if any of these are typed it will print to screen

mydata$V1 mydata$V2 mydata$V3

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

#if any of these are typed it will print to screen

mydata$A mydata$B mydata$C

# Summary about the whole data frame

summary(mydata)

# Summary information of column A

summary(mydata$A)

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

# summary of column B as ‘mydata’ is attached

summary(B)

Two other important options for read.table

If is is separated only by tabs and has a header

mydata

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

# if not already done so

attach(mydata)
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

# if not already done so

attach(mydata)
plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save