BOL: Related items

MinION_GC: An R script to do some QC on MinION data

Radha Agarkar — Sun, 03 Dec 2017 15:19:18 -0600

Other tools focus on getting data out of the fastq or fast5 files, which is slow and computationally intensive. The benefit of this approach is that it works on a single, small, .txt summary file. So it's a lot quicker than most other things out there: it takes about a minute to analyse a 4GB flowcell on my laptop.

https://github.com/roblanf/minion_qc

Address of the bookmark: https://github.com/roblanf/minion_qc

MIX: Combining multiple assemblies from NGS data

Rahul Nayak — Tue, 08 May 2018 04:58:05 -0500

Mix is a tool that combines two or more draft assemblies, without relying on a reference genome and has the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a path in the extension graph that maximizes the cumulative contig length.

The Mix algorithm, approach and results were published in BMC bioinformatics : http://www.biomedcentral.com/1471-2105/14/S15/S16.

Address of the bookmark: https://github.com/cbib/MIX

ALLHiC: Phasing and scaffolding polyploid genomes based on Hi-C data

BioStar — Thu, 20 Dec 2018 12:03:32 -0600

The major problem of scaffolding polyploid genome is that Hi-C signals are frequently detected between allelic haplotypes and any existing stat of art Hi-C scaffolding program links the allelic haplotypes together. To solve the problem, we developed a new Hi-C scaffolding pipeline, called ALLHIC, specifically tailored to the polyploid genomes. ALLHIC pipeline contains a total of 5 steps: prune, partition, rescue, optimize and build.

Address of the bookmark: https://github.com/tangerzhang/ALLHiC/wiki

kallisto: a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data

Jit — Mon, 07 Jan 2019 10:35:14 -0600

kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools. kallisto is described in detail in:

Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525–527 (2016), doi:10.1038/nbt.3519

Address of the bookmark: https://pachterlab.github.io/kallisto/about

heatmaply: popular graphical method for visualizing high-dimensional data

Neel — Sat, 11 Jan 2020 07:34:14 -0600

This work is based on ggplot2 and plotly.js engine. It produces similar heatmaps as d3heatmap, with the advantage of speed (plotly.js is able to handle larger size matrix), and the ability to zoom from the dendrogram.

heatmaply also provides an interface based around the plotly R package. This interface can be used by choosing plot_method = "plotly" instead of the default plot_method = "ggplot". This interface can provide smaller objects and faster rendering to disk in many cases and provides otherwise almost identical features.

Documentation for this package is also available as a pkgdown site: http://talgalili.github.io/heatmaply/

Address of the bookmark: http://talgalili.github.io/heatmaply/articles/heatmaply.html

Useful links to therapy, disease, drug and drug-target network data:

Jit — Mon, 01 Jun 2020 11:47:51 -0500

Useful links to therapy, disease, drug and drug-target network data:

DrugBank:

a bioinformatics- cheminformatics resource combining detailed drug data with comprehensive drug target information with >4900 drug (~3500 experimental) and >1500 non-redundant protein entries http://www.drugbank.ca/

Drug-Target Network:

network data of 890 drugs and 394 target human proteins http://www.nature.com/nbt/journal/v25/ n10/suppinfo/nbt1338_S1.html

Drug-Therapy Network:

three layers of drug-therapy networks according to the ATC classification http://www.biomedcentral.com/1471-2210/8/5/additional/

FDA Orange Book:

approved drug products with therapeutic equivalence evaluations http://www.fda.gov/cder/ob/HIDdb: Thomson Investigational drugs database including information on 107000 patents, 25000 investigational drugs and 80000 chemical structures http://scientific.thomson.com/products/iddb/HOMIM: a knowledgebase of human genes and genetic disorders http://www.ncbi.nlm.nih.gov/ sites/entrez?db=omim

PDTD:

3D drug target structure database with a target identification option http://www.dddc.ac.cn/pdtd/

Predicted drug targets:

a set of 1383 predicted drug targets http://www.biomedcentral.com/1471-2105/8/353/additional/ [25] Protein ligand network: a network of 4208 ligands and ~15000 binding sites http://pbil.kaist.ac.kr/~parkkw/Lnet/

TDR Targets Database:

identification and ranking targets against neglected tropical diseases http://tdrtargets.org/

Therapeutic Target Database:

lists >1500 therapeutic targets, disease conditions and corresponding drugs http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp

MAGIC: A tool for predicting transcription factors and cofactors driving gene sets using ENCODE data

BioStar — Thu, 26 Nov 2020 11:05:04 -0600

The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings:

1) A cell line expressing or lacking single TF,

2) Breast tumors divided along PAM50 designations

3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype

4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels.

In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments.

More at https://uwmadison.app.box.com/s/8j90e5h2rjrsz3bacaxnq8kor2o64vyg

Address of the bookmark: https://github.com/asroopra/MAGIC

LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data

Neel — Wed, 23 Jun 2021 07:54:53 -0500

LoReTTA (Long Read Template-Targeted Assembler), a tool designed for performing de novo assembly of long reads generated from viral genomes on the PacBio platform. LoReTTA exploits a reference genome to guide the assembly process, an approach that has been successful with short reads.

https://academic.oup.com/ve/article/7/1/veab042/6248116

Address of the bookmark: https://academic.oup.com/ve/article/7/1/veab042/6248116

OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes

Abhi — Tue, 02 May 2023 00:48:28 -0500

OrthoVenn3 is a powerful tool for comparative genomics analysis, used as a web server for full genome comparisons, annotation, and evolutionary analysis of orthologous clusters across multiple species. It has already been used by thousands of users from over 60 countries.

Address of the bookmark: https://orthovenn3.bioinfotoolkits.net/

BioKit: a set of tools dedicated to bioinformatics, data visualisation

Neel — Tue, 18 Jun 2024 02:04:39 -0500

BioKit is a set of tools dedicated to bioinformatics, data visualisation (biokit.viz), access to online biological data (e.g. UniProt, NCBI thanks to bioservices). It also contains more advanced tools related to data analysis (e.g., biokit.stats). Since R is quite common in bioinformatics, we also provide a convenient module to run R inside your Python scripts or shell (:mod:biokit.rtools module).

Address of the bookmark: https://biokit.readthedocs.io/en/latest/index.html