BOL: Related items

R Graphs !!

Jit — Fri, 04 Nov 2016 10:48:00 -0500

The blog is a collection of script examples with example data and output plots. R produce excellent quality graphs for data analysis, science and business presentation, publications and other purposes. Self-help codes and examples are provided. Enjoy nice graphs !!

Address of the bookmark: http://rgraphgallery.blogspot.be/

Krona

Jit — Wed, 22 Mar 2017 04:47:35 -0500

Krona allows hierarchical data to be explored with zooming, multi-layered pie charts. Krona charts can be created using an Excel template or KronaTools, which includes support for several bioinformatics tools and raw data formats. The interactive charts are self-contained and can be viewed with any modern web browser (see Browser support).

Address of the bookmark: https://github.com/marbl/Krona/wiki

karyoploteR: plot whole genomes with arbitrary data

Abhimanyu Singh — Fri, 02 Feb 2018 03:24:28 -0600

karyoploteR is an R package to create karyoplots, that is, representations of whole genomes with arbitrary data plotted on them. It is inspired by the R base graphics system and does not depend on other graphics packages. The aim of karyoploteR is to offer the user an easy way to plot data along the genome to get broad genome-wide view to facilitate the identification of genome wide relations and distributions.

Address of the bookmark: https://bernatgel.github.io/karyoploter_tutorial/

dotPlotly: Generate an interactive dot plot from mummer or minimap alignments

Rahul Nayak — Thu, 21 Feb 2019 10:22:17 -0600

Create an interactive dot plot from mummer output OR PAF format

R script that makes a plotly interactive and/or static (png/pdf) dot plot.

Shiny app available for testing

Address of the bookmark: https://github.com/tpoorten/dotPlotly

k-mers tutorial - classification and taxonomy

Neel — Thu, 26 Aug 2021 10:28:43 -0500

DNA k-mers underlie much of our assembly work, and we (along with many others!) have spent a lot of time thinking about how to store k-mer graphs efficiently, discard redundant data, and count them efficiently.

More recently, we've been enthused about using k-mer based similarity measures and computing and searching k-mer-based sketch search databases for all the things.

But I haven't spent too much talking about using k-mers for taxonomy, although that has become an ahem area of interest recently, if you read into our papers a bit.

In this blog post I'm going to fix this by doing a little bit of a literature review and waxing enthusiastic about other people's work. Then in a future blog post I'll talk about how we're building off of this work in fun! and interesting? ways!

Address of the bookmark: http://ivory.idyll.org/blog/2017-something-about-kmers.html

Tools for Geospatial data analysis !

BioStar — Wed, 22 Mar 2023 02:10:28 -0500

Geospatial data is becoming increasingly important in many fields, including urban planning, environmental science, public health, and more. These tools can help you work with data from a variety of sources, including satellite imagery, GPS data, and other forms of spatial data. They can help you visualize data, perform complex analysis, and even create maps and other visualizations.

The list includes some of the most popular and widely used geospatial tools available in Python. These tools can help you work with data from a variety of sources and in a variety of formats. Some of the tools are focused on visualization, such as Cartopy, Folium, and Contextily, which allow you to create interactive maps and other visualizations. Other tools are more focused on data manipulation and analysis, such as Fiona, GeoPandas, and Rasterio, which allow you to manipulate and analyze spatial data in a variety of ways.

The list also includes some tools for working with specific types of geospatial data. For example, the H3 library is designed specifically for working with hexagonal grids, while PySAL is focused on spatial econometrics and spatial analysis. Whether you are a data scientist, GIS specialist, or geospatial enthusiast, these tools are sure to enhance your work and help you achieve your goals.

In summary, this list is an excellent resource for anyone working with geospatial data in Python. It contains a wide range of tools for working with different types of data, and can help you visualize data, perform complex analysis, and create maps and other visualizations. If you're looking to enhance your skills in geospatial analysis, this list is definitely worth checking out.

These tools are:

ArcGIS - https://lnkd.in/dgC6sKJH
Cartopy - https://lnkd.in/dc8ijXRg
Contextily - https://lnkd.in/dTdQsmKX
Descartes - https://lnkd.in/dCJykxwW
Fiona - https://lnkd.in/d8sJ3Q5a
Folium - https://lnkd.in/dfSsE-MB
GDAL - https://lnkd.in/dYBJBaAY
Geohash - https://lnkd.in/d_NxJ4_M
GeoJSON - https://lnkd.in/daGs2WYq
GeoPandas - https://lnkd.in/dBTFKKV3
Geopy - https://lnkd.in/dfAzR8Xa
Gevent - http://www.gevent.org
H3 - https://h3geo.org/docs/
OSMnx - https://lnkd.in/dm3pHgUS
PyQGIS - https://lnkd.in/dShWyWVr
PySAL - https://pysal.org
Pydeck - https://lnkd.in/dGBFu-iw
Pyproj - https://lnkd.in/dNG9fdkm
RTree - https://lnkd.in/dURMiYpU
Rasterio - https://lnkd.in/dEMC6ve6
Scikit-mobility - https://lnkd.in/dpHhaX2J
Shapely - https://lnkd.in/d568datK

These tools offer a wide range of capabilities for working with geospatial data, from visualizing and manipulating data to performing complex analysis and modeling. Whether you are a data scientist, GIS specialist, or geospatial enthusiast, these tools are sure to enhance your work and help you achieve your goals.

Scalpel

Shruti Paniwala — Wed, 20 Aug 2014 02:07:58 -0500

A team from Cold Spring Harbor Laboratory has released an algorithm, called Scalpel, for finding insertions and deletions in next generation sequencing data sets. Scalpel, which is open source and available for download on SourceForge, outperformed the popular tools GATK HaplotypeCaller and SOAPindel in test runs on both simulated and real whole human exomes.

Like other indel callers, Scalpel works by performing de novo assembly of regions of interest, so that misalignment to the reference genome cannot obscure the presence of an insertion or deletion. Scalpel's innovation is to repeatedly check its assembly before comparing to the reference genome, to account for simple sequence repeats that are a regular source of error in indel calling. When Scalpel assembles an exon, it collects reads that map to that exon (including partial matches), splits them into k-mers, and creates a de Bruijn graph to span the exon; however, if it detects repeats in the map, it iteratively increases the size of the k-mers by one base until the repeats are eliminated. This ensures that the final assembly of the exon is highly accurate while minimizing compute time.

The Cold Spring Harbor team's validation of Scalpel, published over the weekend in Nature Methods, compares Scalpel's performance on a live whole exome against HaplotypeCaller and SOAPindel. The donor is an individual with serious neurological disorders, which may be linked to a high incidence of indels. One thousand indels from this individual's exome, called by one or more of the informatics pipelines, were selected for focused resequencing. This resequencing revealed a 77% true positive rate for Scalpel calls, dramatically better than the rates for either of the competing tools; Scalpel performed especially well with indels longer than five base pairs, a traditional weak point for indel callers.

Finally, the authors demonstrate Scalpel's use on a large set of genetic data from nearly 600 families who donated samples to the Simons Simplex Collection, a project of the Simons Foundation Autism Research Initiative. Scalpel found a very high enrichment for indels in children affected by autism, compared with their unaffected siblings, a pattern that persisted even after excluding common variants.

Picard

Neel — Fri, 29 Apr 2016 08:21:54 -0500

Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF specification.

Note that the information on this page is targeted at end-users. For developers, the source code, building instructions and implementation/development resources are available on GitHub.

The Picard toolkit is open-source under the MIT license and free for all uses.

Enjoy!

Address of the bookmark: http://broadinstitute.github.io/picard/

ORFfinder with smart BLAST

Jit — Tue, 17 May 2016 01:43:15 -0500

ORF Finder

ORFfinder is a graphical analysis tool for finding open reading frames (ORFs). We’ve been working on a few updates, and we’d like to find out what you think about them. Read on to find out what you can do with the new ORFfinder.

Smart BLAST (https://ncbiinsights.ncbi.nlm.nih.gov/2015/07/29/smartblast/)

Select one or a group of ORFs and BLAST several databases at once, and use the newly developed SmartBLAST to verify protein names. Looking for the traditional results from BLAST? They’re there too.

BBMap/BBTools package: Multipurpose tool designed for converting reads or other nucleotide data between different formats.

Jit — Mon, 13 Jun 2016 05:47:21 -0500

Reformatis a member of the BBMap/BBTools package. It is a multipurpose tool designed for converting reads or other nucleotide data between different formats. It supports, and can inter-convert:

fastq
fasta
fasta+qual
sam
scarf (an old Illumina format)
bam (if samtools is installed)
gzip
zip
ascii-33 (sanger)
ascii-64 (old Illumina)
paired files
interleaved files

It is multithreaded and can process data at over 500 megabytes per second, and can accept streams from standard in and write to standard out, allowing it to be easily dropped into the middle of a pipeline for format conversion. Reformat autodetects formats based on file extensions and content, making it very easy to use; and the autodetection can be overridden, allowing flexibility for people who don't like to follow naming conventions, or out-of-spec fastq files with qualities values like -17 or 120.

The program has been gradually expanded, and can now perform various other functions. None of these will break pairing, if the input is paired.

Quality trimming (either or both ends)
Quality filtering
Fixed-length trimming
Generation of histograms (base composition, quality, etc)
Subsampling (to a fraction of input reads, or an exact number of reads or bases)
Changing fasta line-wrapping length
Reverse-complementing (all reads or only read 2)
Adding /1 and /2 suffix to read names
GC-content filtering
Length-filtering
Testing for corrupted interleaved files

Reformat is compatible with any platform that supports Java 1.7 or higher. It also has a bash shellscript for simpler invocation. Typical usage examples:

Reformat fastq into fasta:
reformat.sh in=x.fq out=y.fa

Interleave paired reads:
reformat.sh in1=x1.fq in2=x2.fq out=y.fq

Note - you can actually use a shortcut if paired read files have the same name with a 1 and a 2. This is equivalent to the above command:
reformat.sh in=x#.fq out=y.fq

De-interleave reads:
reformat.sh in=x.fq out1=y1.fq out2=y2.fq

Verify that interleaving appears correct, assuming Illumina namimg conventions:
reformat.sh in=x.fq vint

Convert ASCII-33 to ASCII-64:
reformat.sh in=x.fq out=y.fq qin=33 qout=64

Quality-trim paired reads to Q10 on the left and right ends and discard reads shorter than 50bp after trimming:
reformat.sh in1=x1.fq in2=x2.fq out1=y1.fq out2=y2.fq outsingle=singletons.fq qtrim=rl trimq=10 minlength=50

Subsample 10% of the first 20000 pairs in an interleaved file:
reformat.sh in=x.fq out=y.fq reads=20000 samplerate=0.1 int=t
(in this case "int=t" overrides interleaving autodetection, to ensure reads are treated as pairs)

Pipe in a gzipped sam file and pipe out fasta:
reformat.sh in=stdin.sam.gz out=stdout.fa

Reverse-complement reads:
reformat.sh in=x.fq out=y.fq rcomp

For reformatting a file with very long sequences, Reformat will need more memory; just add the additional flag "-Xmx2g". For example, to change the line-wrapping length on the human genome (which has individual sequences over 200Mbp long) to 70 characters:
reformat.sh -Xmx2g in=HG19.fa.gz out=HG19_wrapped.fa.gz fastawrap=70

For additional functions, please run the shellscript with no arguments, or just read it with a text editor. If you have any questions, please post them in this thread.

For people using a non-bash terminal, you may need to type "bash reformat.sh" instead of just "reformat.sh".
For users of Windows or other platforms that do not support bash shellscripts, replace "reformat.sh" with "java -ea -Xmx200m /path/to/bbmap/current/ jgi.ReformatReads"
for example,
java -ea -Xmx200m C:\bbmap\current\ jgi.ReformatReads in=x.fq out=y.fa

Reformat can be downloaded with BBTools here:
https://sourceforge.net/projects/bbmap/