BOL: Related items

LoFreq*: A sequence-quality aware, ultra-sensitive variant caller for NGS data

BioStar — Tue, 18 Feb 2020 03:24:22 -0600

LoFreq* (i.e. LoFreq version 2) is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing (e.g. mapping or base/indel alignment uncertainty), which are usually ignored by other methods or only used for filtering.

https://github.com/CSB5/lofreq

http://csb5.github.io/lofreq/installation/

https://github.com/CSB5/lofreq/tree/master/dist

Address of the bookmark: http://csb5.github.io/lofreq/

Submit your SARS-CoV-2 sequence data to GenBank

Neel — Thu, 09 Apr 2020 18:28:25 -0500

Submit your SARS-CoV-2 sequence data to GenBank and SRA with our new submission landing page. Submission is simple and streamlined *and* there’s a rapid turnaround. https://submit.ncbi.nlm.nih.gov/sarscov2/

Quickly and easily add your SARS-CoV-2 sequence data to the growing public archive with new, special features and support from NCBI. new SARS-CoV-2 sequence submission landing page will help you get started. GenBank submissions are accessioned and released in approximately 1-2 working days, and Sequence Read Archive (SRA) submissions typically processed and released within hours. Submission is simple!

More information is available on NCBI Insights. https://ncbiinsights.ncbi.nlm.nih.gov/2020/04/09/sars-cov2-data-streamlined-submission-rapid-turnaround/

GraphUnzip: Phases an assembly graph using Hi-C data and/or long reads.

Jit — Fri, 05 Feb 2021 21:22:24 -0600

GraphUnzip, a fast, memory-efficient and accurate tool to unzip assembly graphs into their constituent haplotypes using long reads and/or Hi-C data. As GraphUnzip only connects sequences in the assembly graph that already had a potential link based on overlaps, it yields high-quality gap-less supercontigs. To demonstrate the efficiency of GraphUnzip, we tested it on a simulated diploid Escherichia coli genome, and on two real datasets for the genomes of the rotifer Adineta vaga and the potato Solanum tuberosum. In all cases, GraphUnzip yielded highly continuous phased assemblies.

https://www.biorxiv.org/content/biorxiv/early/2021/02/01/2021.01.29.428779.full.pdf

Address of the bookmark: https://github.com/nadegeguiglielmoni/GraphUnzip

Corona Virus Genome and Data Download !

Abhi — Sun, 12 Dec 2021 23:34:54 -0600

Genes and its related metadata could be found on https://www.ncbi.nlm.nih.gov/datasets/coronavirus/genomes/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/datasets/coronavirus/genomes/

Orange: Data mining

BioStar — Mon, 13 Mar 2023 12:42:29 -0500

Open source machine learning and data visualization.

Build data analysis workflows visually, with a large, diverse toolbox.

Address of the bookmark: https://orangedatamining.com/

A guide for complete R beginners :- Getting data into R

Archana Malhotra — Tue, 24 Feb 2015 20:15:08 -0600

For a beginner this can be is the hardest part, it is also the most important to get right.

It is possible to create a vector by typing data directly into R using the combine function ‘c’

x

same as

x

creates the vector x with the numbers between 1 and 5.

You can see what is in an object at any time by typing its name;

x

will produce the output ‘[1] 1 2 3 4 5′

Note that names need to be quoted

daysofweek ← c(‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’);

Usually however you want to input from a file. We have touched on the ‘read.table’ function already.

mydata

Now mydata is a data frame with multiple vectors

each vector can be identified by the default syntax

#if any of these are typed it will print to screen

mydata$V1 mydata$V2 mydata$V3

By default the function assumes certain things from the file

The file is a plain text file (there are function to read excel files: not covered here)
columns are separated by any number of tabs or spaces
there is the same number of data points in each column
there is no header row (labels for the columns)
there is no column with names for the rows** [I’ll explain].

If any of these are false, we need to tell that to the function

If it has a header column

mydata header=T also works

Note that there is a comma between different parts of the functions arguments

If there is one less column in the header row, then R assumes that the 1^st column of data after the header are the row names

Now the vectors (columns) are identified by their name

#if any of these are typed it will print to screen

mydata$A mydata$B mydata$C

# Summary about the whole data frame

summary(mydata)

# Summary information of column A

summary(mydata$A)

We can shortcut having to type the data frame each time by attaching it

attach(mydata)

# summary of column B as ‘mydata’ is attached

summary(B)

Two other important options for read.table

If is is separated only by tabs and has a header

mydata

Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.

If you know that the file has uneven columns

mydata

This causes R to fill empty spaces in a columns with ‘NA’ .

The last two examples will still work with our file and give the same result as with only headers=T

Graphs

to get an idea of what R is capable of type

demo(graphics)

steps through the examples, and the code is printed to the screen

We will work with simpler examples that have immediate use to biologists.

Remember to get more information about the options to a function type ‘?function’

Histogram of A

hist(mydata$A)

If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).

boxplot(mydata)

We can get rid of the need to type the data frame each time by using the attach function

# if not already done so

attach(mydata)
boxplot(mydata$A, mydata$B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

same as

boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Scatter plot

# if not already done so

attach(mydata)
plot(A,B) # or plot(mydata$A, mydata$B)

SAVING an image

Windows users (Rgui) RIGHT click on image and select which you want.

These instructions work for everyone.

You need to create a new device of the type of file you need, then send the data to that device

to save as a png file (easy to load into the likes of powerpoint, also great for web applications.

png(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

or to save as a pdf

pdf(‘filename’)
boxplot(A, B, name=c(“Value A”, “Value B”) , ylab=“Count of Something”)

Note

Nothing will appear on screen, the output is going to the file
Also it may not be saved immediately but will once the device (or R) is turned quit.

To quit R type

q() # If you save your session, next time you start R, you will have your data preloaded.

Or if you want to remain in R

dev.off() #turns of the png (or pdf etc) device, thus forces the data to save

Genome in a Bottle (GIAB) Consortium

Jit — Sat, 25 Jan 2020 13:50:52 -0600

The Genome in a Bottle (GIAB) Consortium is a public-private-academic consortium hosted by NIST to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of whole human genome sequencing to clinical practice.

https://www.nist.gov/news-events/news/2016/09/nist-releases-new-family-standardized-genomes

Address of the bookmark: https://jimb.stanford.edu/giab/

AutoGluon: AutoML for Text, Image, and Tabular Data

Jit — Thu, 07 Jan 2021 05:33:17 -0600

AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models on text, image, and tabular data.

Address of the bookmark: https://github.com/awslabs/autogluon

NASA Open Science Data Repository

Abhi — Wed, 18 Dec 2024 11:54:47 -0600

The NASA Open Science Data Repository (OSDR) enables access to space-related data from experiments and missions that investigate biological and health responses of terrestrial life to spaceflight. The goal of OSDR is to enable multi-modal and multi-hierarchical fundamental space life science data be reused toward basic science, applied science, and operational outcomes for space exploration and knowledge discovery. These data include ‘omics, phenotypic, physiological, behavioral, hardware, environmental telemetry; raw, processed; tabular, text, code, bioimaging, and video.

https://www.nasa.gov/reference/osdr-data-processing/

Address of the bookmark: https://www.nasa.gov/osdr/

gbtools: Interactive Visualization of Metagenome Bins in R

Jit — Sun, 26 Mar 2017 15:41:31 -0500

We have developed gbtools, a software package that allows users to visualize metagenomic assemblies by plotting coverage (sequencing depth) and GC values of contigs, and also to annotate the plots with taxonomic information. Different sets of annotations, including taxonomic assignments from conserved marker genes or SSU rRNA genes, can be imported simultaneously; users can choose which annotations to plot. Bins can be manually defined from plots, or be imported from third-party binning tools and overlaid onto plots, such that results from different methods can be compared side-by-side. gbtools reports summary statistics of bins including marker gene completeness, and allows the user to add or subtract bins with each other.

Tool at https://github.com/kbseah/genome-bin-tools

Address of the bookmark: http://journal.frontiersin.org/article/10.3389/fmicb.2015.01451/full