BOL: Related items

Bioinformatics Services / CRO Services

RASA Life Sciences — Wed, 06 Nov 2019 00:33:11 -0600

RASA is set to provide premium technical and scientific services in a form of solutions, product development and training. .We are also very proficient in providing the high quality Research & Development services in life science informatics field like Next Generation Sequencing (NGS) Data Analysis,Computational Drug Discovery, Bioinformatics, Chemo-informatics and BIO-IT.

RASA offers faster, better and cost effective cutting edge technology solutions to chemical and life science research and industry. We provide our customers with A seamless model of wide expertise and comprehensive platforms. Our Value is to take our customers

Useful Bioinformatics Analysis Tools !

Neel — Thu, 23 Dec 2021 23:10:02 -0600

CoMeta

Classificier of reads from metagenomic sequencing experiments.

• Kawulok, J., Deorowicz, S., CoMeta: Classification of Metagenomes Using k-mers, PLOS ONE, 2015; 10(4):1–23,

CoMSA

Compressor of multiple sequence alignments of proteins.

• Deorowicz, S., Walczyszyn, J., Debudaj-Grabysz, A., CoMSA: compression of protein multiple sequence alignment files, Bioinformatics, 2019; 35(2):22–234,

DSRC

Compressor of sequencing reads.

• Roguski, L., Deorowicz, S., DSRC 2: Industry-oriented compression of FASTQ files, Bioinformatics, 2014; 30(15):2213–2215,
• Deorowicz, S., Grabowski, Sz., Compression of DNA sequences in FASTQ format, Bioinformatics, 2011; 27(6):860–862,

FAMSA

Multiple sequence alignment designed for huge families of proteins (even containing hundreds of thousands of sequences).

• Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., FAMSA: Fast and accurate multiple sequence alignment of huge protein families, Scientific Reports, 2016; 6(33964):

FaStore

Compressor of FASTQ files.

• Roguski, L., Ochoa, I., Hernaez, M., Deorowicz, S., FaStore - a space-saving solution for raw sequencing data, Bioinformatics, 2018; 34(16):2748–2756,

FQSqueezer

Experimental high-end compressor of FASTQ files.

• Deorowicz, S., FQSqueezer: k-mer-based compression of sequencing data, Scientific Reports, 2020; 10(578):

GDC

Compressor of collections of genome sequences.

• Deorowicz, S., Danek, A., Niemiec, M., GDC 2: Compression of large collections of genomes, Scientific Reports, 2015; 5(11565):1–12,
• Deorowicz, S., Grabowski, Sz., Robust relative compression of genomes with random access, Bioinformatics, 2011; 27(21):2979–2986,

GTC

Genotype databases compressor with support for fast queries.

• Danek, A., Deorowicz, S., GTC: how to maintain huge genotype collections in a compressed form, Bioinformatics, 2018; 34(11):1834–1840,

GTShark

Genotypes compressor.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, Bioinformatics, 2019; 35(22):4791–4793,

KMC

Memory frugal k-mer counter.

•  Kokot, M., Długosz, M., Deorowicz, S., KMC 3: counting and manipulating k -mer statistics, Bioinformatics, 2017; 33(17):2759–2761,
•  Deorowicz, S., Kokot, M., Grabowski, Sz., Debudaj-Grabysz, A., KMC 2: Fast and resource-frugal k-mer counting, Bioinformatics, 2015; 31(10):1569–1576,
•  Deorowicz, S., Debudaj-Grabysz, A., Grabowski, Sz., Disk-based k-mer counting on a PC, BMC Bioinformatics, 2013; 14():Article no. 160,

Kmer-db

Tool for estimation of evolutionary distances in a collection of genomes.

• Deorowicz, S., Gudys, A., Dlugosz, M., Kokot, M., Danek, A., Kmer-db: instant evolutionary distance estimation, Bioinformatics, 2019; 35(1):133–136,

MuGI

Index allowing queries for a collection of multiple genome sequences.

• Danek, A., Deorowicz, S., Grabowski, Sz., Indexes of Large Genome Collections on a PC, PLOS ONE, 2014; 9(10):e109384,

ORCOM

Experimental compressor of sequencing reads.

• Grabowski, Sz., Deorowicz, S., Roguski, L., Disk-based compression of data from genome sequencing, Bioinformatics, 2014; 31(9):1389–1395,

PgSA

Index allowing queries for a collection of sequencing reads.

• Kowalski, T., Grabowski, Sz., Deorowicz, S., Indexing arbitrary-length k-mers in sequencing reads, PLOS ONE, 2015; 10(7):1–16,

QuickProbs

Multiple sequence alignment designed especially for GPU.

• Gudys, A., Deorowicz, S., QuickProbs 2: towards rapid construction of high-quality alignments of large protein families, Scientific Reports, 2017; 7(41553):
• Gudys, A., Deorowicz, S., QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors, PLOS ONE, 2014; 9(2):e88901,

RECKONER

Read error corrector.

• Maciej Długosz, M., Deorowicz, S., RECKONER: read error corrector based on KMC, Bioinformatics, 2017; 33(7):1086–1089,

TGC

Compressor of collections of genomes given in Variant Call Format (VCF) files.

• Deorowicz, S., Danek, A., Grabowski, Sz., Genome compression: a novel approach for large collections, Bioinformatics, 2013; 29(20):2572–2578,

VCFShark

Compressor of VCF files.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, biorxiv.org, 2020; ():

Whisper

Experimental mapper of whole genome sequencing data.

•  Deorowicz, S., Gudys, A., Whisper 2: indel-sensitive short read mapping, bioRxiv.org, 2019; :
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Whisper: read sorting allows robust robust mapping of DNA sequencing data, Bioinformatics, 2019; 35(12):2043–2050,
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Robust mapping of whole genome sequencing data, Poster at The Biology of Genomes Conference, 2017;

The genome factory !!!

Madhvan Reddy — Thu, 16 Jan 2014 02:09:31 -0600

Illumina, Inc. announced Tuesday that its new HiSeq X Ten Sequencing System has broken the “sound barrier” of human genomics by enabling the $1,000 genome. “This platform includes dramatic technology breakthroughs that enable researchers to undertake studies of unprecedented scale by providing the throughput to sequence tens of thousands of human whole genomes in a single year in a single lab,” Illumina stated.

Initial customers for the HiSeq X Ten System, which will ship in Q1 2014, include Macrogen, based in Seoul, South Korea and its CLIA laboratory in Rockville, Maryland, the Broad Institute in Cambridge, Massachusetts, and the Garvan Institute of Medical Research in Sydney, Australia.

“For the first time, it looks like it will be possible to deliver the $1,000 genome, which is tremendously exciting,” said Eric Lander, founding director of the Broad Institute and a professor of biology at MIT. “The HiSeq X Ten should give us the ability to analyze complete genomic information from huge sample populations. Over the next few years, we have an opportunity to learn as much about the genetics of human disease as we have learned in the history of medicine.”

“The HiSeq X Ten is an ideal platform for scientists and institutions focused on the discovery of genotypic variation to enable a deeper understanding of human biology and genetic disease,” Illumina stated. “It can sequence tens of thousands of samples annually with high-quality, high-coverage sequencing, delivering a comprehensive catalog of human variation within and outside coding regions.”

HiSeq X Ten utilizes a number of advanced design features to generate massive throughput. Patterned flow cells, which contain billions of nanowells at fixed locations, combined with a new clustering chemistry deliver a significant increase in data density (6 billion clusters per run). Using state-of-the art optics and faster chemistry, HiSeq X Ten can process sequencing flow cells more quickly than ever before — generating a 10x increase in daily throughput when compared to current HiSeq 2500 performance.

The HiSeq X Ten is sold as a set of 10 or more ultra-high throughput sequencing systems, each generating up to 1.8 terabases (Tb) of sequencing data in less than three days or up to 600 gigabases (Gb) per day, per system, providing the throughput to sequence tens of thousands of high-quality, high-coverage genomes per year. Illumina says the $1,000 includes typical instrument depreciation, DNA extraction, library preparation, and estimated labor.

Introduction to Bioinformatics

eliabrodsky — Wed, 05 Jun 2019 14:58:11 -0500

Introduction to bioinformatics is a course for biologists and clinicians that would like to learn more about the way bioinformatics is used in healthcare, biotech and pharmaceuitcal industry as well as basic research. The course covers many of the topics transformed by the emergence of big data and computational technologies. To learn more about the course, visit: https://edu.t-bio.info/course/introduction-bioinformatics/

Liu Lab

Tue, 04 Feb 2020 06:27:02 -0600

Shirley is a computational biologist with expertise in cancer epigenetics. Her research focuses on algorithm development and integrative mining from big data generated on microarrays, massively parallel sequencing, and other high throughput techniques to model the specificity and function of transcription factors, chromatin regulators and lncRNAs in tumor development, progression, drug response and resistance.

https://liulab-dfci.github.io/software/

bioinformatics workbook

biogeek — Tue, 05 Jan 2021 22:42:32 -0600

This books assumes that the reader has some knowledge of biology and basic understanding of the Unix command line. However, for the beginner, the appendix contains introductory material and tips/tricks for common bioinformatic problems, that is referred to for more information throughout the book.

https://bioinformaticsworkbook.org/

Address of the bookmark: https://bioinformaticsworkbook.org/

Bioinformatics tools for genome assembly !

BioStar — Mon, 24 Jul 2023 07:04:26 -0500

There are numerous genome assembly tools available, each with its strengths and weaknesses. Here is a list of some widely used genome assembly tools as of my last update in September 2021:

SPAdes: An assembler specifically designed for single-cell and multi-cell bacterial genomes, as well as small eukaryotic genomes.
ABySS: A parallelized assembler for large genomes that uses de Bruijn graphs.
Velvet: Another de Bruijn graph-based assembler optimized for short-read sequencing data.
SOAPdenovo: A de Bruijn graph-based assembler designed for short reads, widely used for assembling large and complex genomes.
MaSuRCA: A hybrid assembler that combines data from multiple sequencing technologies, such as Illumina and PacBio.
Canu: A long-read assembler optimized for PacBio and Oxford Nanopore sequencing data.
Flye: A long-read assembler suitable for bacterial and small eukaryotic genomes.
SMARTdenovo: An assembler designed for long reads, particularly suited for PacBio data.
SPAdes Long Read (SPAdesLR): An extension of SPAdes for long-read data, such as those from PacBio or Nanopore.
Minia: An assembler optimized for low memory consumption, suitable for small and medium-sized genomes.
Unicycler: A hybrid assembler that combines short and long reads for circular bacterial genome assembly.
wtdbg2: A de Bruijn graph assembler for long reads, efficient for very large genomes.
Shasta: A long-read assembler that uses the Overlap-Layout-Consensus approach, suitable for PacBio and Nanopore data.
Sparc: An assembler designed to handle noisy long reads from Nanopore sequencing.
CANA: An assembler for metagenomic data, particularly for complex and diverse microbial communities.
Ra Assembler: A metagenome assembler for long reads, designed for highly complex metagenomic samples.

Please note that the field of bioinformatics is constantly evolving, and new assembly tools may have emerged since my last update. Additionally, the performance of these tools can vary depending on the characteristics of the sequencing data and the genome being assembled. When selecting an assembly tool, consider the specific requirements of your project, the available data types, and the computational resources at your disposal. Always refer to the respective tool's documentation and publications for the most up-to-date information and recommendations.

A Brief Bioinformatics Tutorial

Jit — Wed, 21 May 2014 12:50:09 -0500

This is about how to use a computer to find what is known about a gene of interest and also how to get new insights about it.

The tutorial is divided in three main parts:

In the Sequence part, you will see how to look efficiently for a particular protein sequence, how to blast it against the database of your choice to find homologues, how to perform a multiple alignment of the homologues you've selected and how to edit this alignment.
The Structure part is about molecular visualization, homology modeling and structural domain prediction.
In the Function part, you will be introduced to you 3 useful servers to investigate the function of a protein. i.e. finding interactors, co-expressed genes, see a phylogenetic profile, easily access papers citing your gene etc ...

During all the three parts, we will use the S. cerevisiae VPS36 protein as an example.

Address of the bookmark: http://www.mrc-lmb.cam.ac.uk/rlw/text/bioinfo_tuto/introduction.html

List of visualization tools for genome alignments

Rahul Nayak — Fri, 02 Feb 2018 13:25:33 -0600

Genome browsers are useful not only for showing final results but also for improving analysis protocols, testing data quality, and generating result drafts. Its integration in analysis pipelines allows the optimization of parameters, which leads to better results. But sometime, we need publication ready figure of genomes. Following are the list of genome alignment visualization tools, which could be useful for analysis and interpretation of results:

ABySS Explorer

Interactive Java application that uses a novel graph-based representation to display a sequence assembly and associated metadata

http://www.bcgsc.ca/platform/bioinfo/software/abyss-explorer

BamView

Genome browser and annotation tool that allows visualization of sequence features, next-generation sequencing (NGS) data and the results of analyses within the context of the sequence, and also its six-frame translation

http://www.sanger.ac.uk/resources/software/artemis/

DNannotator

Annotation web toolkit for regional genomic sequences

http://bioapp.psych.uic.edu/DNannotator.htm

JVM

Java Visual Mapping tool for NGS reads

http://www.springer.com/cda/content/document/cda_downloaddocument/9789401792448-c2.pdf?SGWID=0-0-45-1487072-p176815501

LookSeq

Web-based visualization of sequences derived from multiple sequencing technologies. Low- or high-depth read pileups and easy visualization of putative single nucleotide and structural variation

http://lookseq.sourceforge.net

MagicViewer

Visualization of short read alignment, identification of genetic variation and association with annotation information of a reference genome

http://bioinformatics.zj.cn/magicviewer/

MapView

Alignments of huge-scale single-end and pair-end short reads

http://omictools.com/mapview-s1367.html

MultiPipMaker

Computes alignments of similar regions in two DNA sequences. The resulting alignments are summarized with a ‘percent identity plot’ (pip)

http://pipmaker.bx.psu.edu/pipmaker/

PileLineGUI

Handling genome position files in NGS studies

http://sing.ei.uvigo.es/pileline/pilelinegui.html

SAMtools tview

Simple and fast text alignment viewer; NGS compatible

http://www.htslib.org/

SEWAL

Uses a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run

http://www.sourceforge.net/projects/sewal

STAR

A web-based integrated solution to management and visualization of sequencing data

http://wanglab.ucsd.edu/star/browser

SVA

Software for annotating and visualizing sequenced human genomes

http://www.svaproject.org

Viewer (IGV)

Visualization of large heterogeneous datasets, providing a smooth and intuitive user experience at all levels of genome resolution

https://www.broadinstitute.org/igv/

ZOOM Lite

NGS data mapping and visualization software

http://bioinfor.com/zoom/lite/

Next generation sequencing in R or bioconductor environment

John Parker — Mon, 02 Jun 2014 18:03:09 -0500

There are many R software and bioconductor packages for NGS data analysis, some of them are as follows

Biostrings

The Biostrings package from Bioconductor provides an advanced environment for efficient sequence management and analysis in R. It contains many speed and memory effective string containers, string matching algorithms, and other utilities, for fast manipulation of large sets of biological sequences. The objects and functions provided by Biostrings form the basis for many other sequence analysis packages. Documentation

IRanges Overview

IRanges provides the low-level infrastructure and containers for handling sets of integer ranges within Bioconductor's BioC-Seq domain. Its classes and methods provide support for many more high-level packages like GenomicRanges, ShortRead, Rsamtools, etc. Documentation

GenomicRanges Overview

The GenomicRanges package serves as the foundation for representing genomic locations within the Bioconductor project. It is built upon the IRanges infrastructure and defines three major data containers - GRanges, GRangesList and GappedAlignments - which are supporting other important BioC-Seq packages including ShortRead, Rsamtools, rtracklayer, GenomicFeatures and BSgenome. Compared to the IRanges container, the GRanges/GRangesList classes are more flexible and extensible to store additional information about sequence ranges, such as chromosome identifiers (sequence space), strand information and annotation data. Documentation

Motif Discovery

cosmo

The cosmo package allows to search a set of unaligned DNA sequences for a shared motif that may function as transcription factor binding site. The algorithm extends the popular motif discovery tool MEME (Bailey and Elkan, 1995) in that it allows the search to be supervised by specifying a set of constraints that the motif to be discovered must satisfy. Documentation

BCRANK

BCRANK is a method that takes a ranked list of genomic regions as input and outputs short DNA sequences that are overrepresented in some part of the list. The algorithm was developed for detecting transcription factor (TF) binding sites in a large number of enriched regions from high-throughput ChIP-chip or ChIP-seq experiments, but it can be applied to any ranked list of DNA sequences. Documentation

rGADEM: Documentation

MotIV: Documentation

ShortRead

The ShortRead package provides input, quality control, filtering, parsing, and manipulation functionality for short read sequences produced by high throughput sequencing technologies. While support is provided for many sequencing technologies, this package is primairly focused on Solexa/Illumina reads. Documentation

Rsamtools

Rsamtools provides functions for parsing and inspecting samtools BAM formatted binary alignment data. SAM/BAM is quickly becoming a universal standard alignment format, and is now supported by a wide variety of alignment tools. Documentation

Samtools Website
BWA (Burrows-Wheeler Alignment) Website

Additional tools for SNP analysis:

snpMatrix

BSgenome

BSgenome provides an object oriented infrastructure for interacting with a Biostring based genome sequence. BSgenome packages exist for many common genomes, and can be created to represent custom genomes. See the "How to forge a BSgenome data package" Vignette for instructions to create a new BSgenome package if a prebuilt package does not exist for your organism. Documentation

rtracklayer

rtracklayer provides an interface for exporting annotation feature data to various genome browsers and file formats (such as GFF). See the Small RNA Profiling exercise for an example of using rtracklayer to visualize alignment coverage. Documentation

biomaRt

The biomaRt package, provides an interface to a growing collection of databases implementing the BioMart software suite (http:// www.biomart.org). The package enables online retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas. This data is retrieved automatically via the Internet, so it's recommended that you cache the data locally, or check versions if your code will be adversely affected by updates to these data. Documentation

ChIP-Seq Analysis Packages

Bioconductor provides various packages for analyzing and visualizing ChIP-Seq data. Only a small selection of these packages is introduced here. Additional useful introductions to this topic are: BioC ChIP-seq Case Study and BioC ChIP-Seq.

chipseq

The chipseq package combines a variety of HT-Seq packages to a pipeline for ChIP-Seq data analysis. Documentation

BayesPeak

BayesPeak is a peak calling package for identifying DNA binding sites of proteins in ChIP-Seq experiments. Its algorithm uses hidden Markov models (HMM) and Bayesian statistical methods. The following sample code introduces the identification of peaks with the BayesPeak package as well as the incorporation of read coverage information obtained by the chipseq package. Documentation [ Publication ]

PICS

The PICS package applies probabilistic inference to aligned-read ChIP-Seq data in order to identify regions bound by transcription factors. PICS identifies enriched regions by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. The following sample code uses the test data set from the above BayesPeak package in order to compare the results from both methods by identifying their consensus peak set. Documentation [ Publication ]

ChIPpeakAnno

The ChIPpeakAnno package provides. batch annotation of the peaks identified from either ChIP-seq or ChIP-chip experiments. It includes functions to retrieve the sequences around peaks, obtain enriched Gene Ontology (GO) terms, find the nearest gene, exon, miRNA or custom features such as most conserved elements and other transcription factor binding sites supplied by users. The package leverages the biomaRt, IRanges, Biostrings, BSgenome, GO.db, multtest and stat packages. Documentation

Additional ChIP-Seq Packages

DiffBind: Documentation

MOSAICS: Documentation

iSeq: Documentation

ChIPseqR: Documentation

ChiPsim: Documentation

CSAR: Documentation

ChIP-Seq Pipeline: PICS, rGADEM and MotIV (developer web site)

SPP: ChIP-seq processing pipeline

SPP Tutorial

MACS

SIPeS

RNA-Seq Analysis

Counting Reads that Overlap with Annotation Ranges

The GenomicRanges package provides support for importing into R short read alignment data in BAM format (via Rsamtools) and associating them with genomic feature ranges, such as exons or genes. This way one can quantify the number of reads aligning to annotated genomic regions. The package defines general purpose containers for storing genomic intervals as well as more specialized containers for storing alignments against a reference genome. The two main functions for read counting provided by this infrastructure are countOverlaps and summarizeOverlaps. For their proper usage, it is important to read the corresponding PDF manual. Documentation

Differential Gene Expression Analysis with DESeq

The DESeq package contains functions to call differentially expressed genes (DEGs) in count tables based on a model using the negative binomial distribution. It expects as input a data frame with the raw read counts per region/gene of interest (rows) for each test sample (columns). Such a count table can be imported into R or generated from BAM alignment files using the countOverlaps function as introduced above. Documentation

Differential Gene Expression Analysis with edgeR

The edgeR package uses empirical Bayes estimation and exact tests based on the negative binomial distribution to call differentially expressed genes (DEGs) in count data.

Documentation

A variety of additional R packages are available for normalizing RNA-Seq read count data and identifying differentially expressed genes (DEG):

easyRNASeq (simplifies read counting per genome feature)

DEXSeq (Inference of differential exon usage); parathyroidSE explains how to generate exon read counts in R

DEGseq

baySeq (also see: segmentSeq)

Genominator (Bullard et al. 2010)

Detection of Alternative Splice Junctions

Another utility of RNA-Seq experiments is the analysis of splice junctions. The following software suggestions provide this utility:

ERANGE
TopHat

SpliceMap

SplitSeek

DNA-Methylation Data Analysis

methylPipe
bsseq
BiSeq
Much more under BiocViews

HT-Seq Data Visualization

ggbio: ggplot2 extension for genomics data (online manual) Gviz: Plotting data and annotation information along genomic coordinates HilbertVis: Hilbert genome plots

GenomeGraphs: Plotting genomic information from Ensembl

TileQC: Flow Cell Quality Visualization

rtracklayer: R interface to genome browsers

genoPlotR: Plotting maps of genes and genomes

Genominator: Tools for storing, accessing, analyzing and visualizing genomic data.

To install all packages

source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("ShortRead", "Biostrings", "IRanges", "BSgenome", "rtracklayer", "biomaRt", "chipseq", "ChIPpeakAnno", "Rsamtools", "BayesPeak", "PICS", "GenomicRanges", "DESeq", "edgeR", "leeBamViews", "GenomicFeatures", "BSgenome.Celegans.UCSC.ce2"))