BOL: Related items

Bioinformatics tools to detect horizontal gene transfer (HGT) in genomes

Jit — Fri, 02 Mar 2018 04:56:23 -0600

Horizontal gene transfer (HGT), the “non-sexual movement of genetic material between two organisms” , is relatively common in prokaryotes and single-celled eukaryotes, but a number of factors combine to make it far rarer in multicellular eukaryotes. In order for a eukaryotic species to gain a gene by HGT, foreign DNA must enter the host nucleus, integrate into the genome, and in more complex organisms it must enter the sequestered germline in order to be transmitted to offspring. Once there, it must not experience strong negative selection, despite potential for genetic incompatibility with the host genome and mismatch between the niche of the donor and the host. Over the longer term, foreign DNA may become “domesticated” in the recipient genome and provide novel function.

Following are the popular tool to detect HGT in genomes:

T-REX / 3.22

HGT detection / download & compile

20525630

RANGER-DTL / 2.0

HGT detection / download binary

22689773

PhyloNet / 3.6.1

HGT detection / download binary

18662388

Jane / 4.01

HGT detection / download binary (!license!)

20181081

TREE-PUZZLE / 5.3.rc16

HGT detection / download & compile

11934758

CONSEL / 0.20

HGT detection / download

11751242

DarkHorse / 1.5 rev170

HGT detection / download & install

17274820

HGTector / 0.2.1

HGT detection / git clone

25159222

EGID / 1.0

HGT detection / download

22355228

GeneMarkS / 4.30

HGT detection / download binary (!license!)

9461475

Protein-Protein Interaction Sites Predictions !

Poonam Mahapatra — Wed, 25 Apr 2018 04:53:20 -0500

The study of Protein–Protein Interactions (PPIs) has a crucial role in biology, medicine and the pharmaceutical industry. PPIs can be investigated from two aspects: The interaction partners of a specific protein and the amino acid residues participating in a given PPI. Information about a protein’s interaction partners allows scientists to construct protein interaction networks, such as signaling pathways, which in turn facilitate the understanding of many biological and clinical observations.

Following are the list of tools commonly used to PPIs predictions:

Protein-Protein Interaction Sites

PPISP

A consensus neural network method for predicting protein-protein interaction sites

HOMCOS

A server to predict interacting protein pairs and interacting sites by homology modeling of complex structures

HotPOINT

Prediction of protein interfaces using an empirical model

ISIS

Prediction of interaction hotspots from sequence

KFC server

Automated decision-tree approach to predicting protein-protein interaction hot spots

meta-PPISP

A meta server for predicting protein-protein interaction sites. meta-PPISP is built on three individual web servers: cons-PPISP, PINUP, and Promate

ODA

Identification of optimal surface patches with the lowest docking desolvation energy values

PINUP

Protein binding site prediction with an empirical scoring function

Other Sites (DNA, RNA, Metals)

CHED

Web server for predicting soft metal binding sites in proteins

DBD-Hunter

A knowledge-based method for the prediction of DNA-protein interactions

DISPLAR

Given the structure of a protein known to bind DNA, the method predicts residues that contact DNA using neural network method

iDBPs

Predicts DNA binding proteins for proteins with known 3D structure.

PFplus

A tool for extracting and displaying positive electrostatic patches on protein surfaces which can be indicative of nucleic acid binding interfaces.

MITObim - mitochondrial baiting and iterative mapping

Rahul Nayak — Tue, 08 May 2018 04:15:25 -0500

This document contains instructions on how to use the MITObim pipeline described in Hahn et al. 2013. The full article can be found here. Kindly cite the article if you are using MITObim in your work. The pipeline was originally developed for Illumina data, but thanks to the versatility of the MIRA assembler, MITObim supports in principle also data from the Iontorrent, 454 and PacBio sequencing platforms.

Below you can find a few basic tutorials for how to run MITObim and I encorage you to give them a try with the testdata that comes with this Repo, just to make sure everything is running smoothly on your system. It'll only take a few minutes and will potentially safe you a lot of time down the line.

I provide further examples here as Jupyter notebooks. Get in touch if you feel like sharing your particular MITObim solution and I'd be happy to put it up here, too!

Address of the bookmark: https://github.com/chrishah/MITObim

vcfR: a package to manipulate and visualize VCF data in R

Jit — Thu, 25 Oct 2018 09:05:59 -0500

VcfR is an R package intended to allow easy manipulation and visualization of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices from the VCF data for use with typical R functions. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file or converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and the R environment connecting familiar software with genomic data.

Address of the bookmark: https://github.com/knausb/vcfR

Tools for RNA classification

Abhi — Tue, 08 Nov 2022 03:39:11 -0600

barrnap - https://github.com/tseemann/barrnap

CPAT - https://github.com/liguowang/cpat, http://lilab.research.bcm.edu/ (web server)

CPC2 - https://github.com/gao-lab/CPC2_standalone, http://cpc2.gao-lab.org/ (web server)

Infernal - http://eddylab.org/infernal/, https://github.com/EddyRivasLab/infernal

NCBI RefSeq - https://www.ncbi.nlm.nih.gov/refseq/

Rfam - http://rfam.xfam.org/, https://docs.rfam.org/en/latest/index.html

SILVA - https://www.arb-silva.de/

RNAmmer - http://www.cbs.dtu.dk/services/RNAmmer/ (web server, standalone download link)

Bioinformatics Tools for Phylogeny !

BioStar — Mon, 06 Nov 2023 03:09:59 -0600

Direct access to the individual tools available on this server.

Multiple Alignment:	Phylogeny:	Tree viewers:	Utilities:
MUSCLE	PhyML	TreeDyn	Gblocks
T-Coffee / 3DCoffee	TNT	Drawgram	Jalview
ClustalW	BioNJ	Drawtree	Readseq
ProbCons	MrBayes	ATV (A Tree Viewer)	Built-in converter

Exploring RNA Sequence Analysis: Tools for Every Bioinformatician

Neel — Fri, 13 Dec 2024 04:03:04 -0600

RNA sequence analysis has become an essential part of modern biological research. From RNA-seq pipelines to specialized tools for specific RNA types, here's a comprehensive guide to tools you can use to make sense of RNA data.

1. RNA-Seq Analysis Pipelines

RNA-seq is one of the most popular techniques for studying RNA. These tools streamline processing raw sequence data:

FASTQC: For quality control of raw RNA-seq reads.
Trimmomatic: For trimming and filtering RNA-seq reads.
HISAT2/STAR: High-performance aligners for RNA-seq reads.
FeatureCounts: For quantifying gene expression.
DESeq2/EdgeR: For differential expression analysis.

2. Transcriptome Assembly and Annotation

For analyzing transcriptomes from non-model organisms or assembling novel transcripts:

Trinity: For de novo transcriptome assembly.
StringTie: For transcript assembly and quantification from RNA-seq alignments.
TransDecoder: To predict coding regions within assembled transcripts.
TAU: Tools for annotating non-coding and coding RNAs.

3. Exploring Non-Coding RNA (ncRNA)

Non-coding RNAs play critical regulatory roles. Dedicated tools for studying them include:

Infernal: For identifying ncRNA sequences based on covariance models.
Rfam: Database and tools for ncRNA families.
miRDeep: For identifying microRNAs in RNA-seq datasets.

4. RNA Structure and Motif Analysis

Structural biology of RNA helps in understanding its function:

RNAfold (ViennaRNA): Predicts secondary structures from RNA sequences.
RNAstructure: Tools for RNA secondary structure prediction and analysis.
MEME Suite: For identifying motifs in RNA sequences.
IntaRNA: For RNA-RNA interaction prediction.

5. RNA Editing and Modifications

Epitranscriptomics is a growing field focusing on RNA modifications:

REDItools: For RNA editing analysis.
m6Aboost: For identifying m6A modifications in RNA.

6. Long-Read RNA Sequencing Analysis

Long-read technologies like Nanopore and PacBio are transforming RNA research:

FLAIR: For isoform-level analysis of long-read RNA-seq data.
NanoMod: For detecting modifications in RNA from Nanopore sequencing.

7. RNA-Protein Interactions

To study RNA-protein interactions and complexes:

RBPmap: For identifying RNA-binding protein motifs.
PARalyzer: For analyzing PAR-CLIP data.

8. Functional Enrichment Analysis

Understanding biological functions and pathways from RNA-seq data:

getENRICH: A tool designed for pathway enrichment analysis of non-model organisms (hypergeometric P-value calculation with FDR correction).
ClusterProfiler: For GO and KEGG pathway enrichment analysis.

9. Visualization and Data Sharing

Presenting and sharing RNA sequence analysis results effectively:

IGV: Genome browser for visualizing RNA-seq alignments.
Circos: Circular visualization of RNA-seq data.
DashBio: A Python library for creating bioinformatics visualizations.

Conclusion

The bioinformatics landscape for RNA sequence analysis is vast, with tools catering to specific needs. Whether you’re studying coding RNAs, non-coding RNAs, or exploring RNA-protein interactions, the right tools can transform your data into biological insights.

List of bioinformatics workflow management tools !

Rahul Nayak — Sat, 20 Mar 2021 00:15:25 -0500

Here are list of Workflow Managers

BigDataScript – A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [ paper-2014 | web ]
Bpipe – A small language for defining pipeline stages and linking them together to make pipelines. [ web ]
Common Workflow Language – a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [ web ]
Cromwell – A Workflow Management System geared towards scientific workflows. [ web ]
Galaxy – a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools. [ paper-2018 | web ]
Nextflow (recommended) – A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [ paper-2018 | web ]
Ruffus – Computation Pipeline library for python widely used in science and bioinformatics. [ paper-2010 | web ]
SeqWare – Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments. [ paper-2010 | web ]
Snakemake – A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [ paper-2018 | web ]
Workflow Descriptor Language – Workflow standard developed by the Broad. [ web ]

Bioinformatics tools developed for Oxford Nanopore data analysis !

biogeek — Wed, 27 Dec 2017 20:47:30 -0600

MinION is the only portable real-time device for DNA and RNA sequencing. Each consumable flow cell can now generate 10–20 Gb of DNA sequence data. Ultra-long read lengths are possible (hundreds of kb) as you can choose your fragment length. One of the technical advantages of ONT data is the read length, which offers great prospects for genome assembly. Generally, assemblers are based on several different types of algorithms, such as greedy, overlap-layout-consensus (OLC), de Bruijn graph (DBG), and string graph.

List of analysis tools developed for Oxford Nanopore data

BWA
Fast nanopore data tuned alignment tool
https://github.com/lh3/bwa

GraphMap
Mapper for long and error-prone reads
https://github.com/isovic/graphmap

LAST
Nanopore tuned alignment tool
http://last.cbrc.jp/

LINKS
Software tool for long read scaffolding
https://github.com/warrenlr/LINKS/

marginAlign
Tools to align nanopore reads to a reference
https://github.com/benedictpaten/marginAlign

minoTour
Real time analysis tools
http://minotour.nottingham.ac.uk/

nanoCORR
Error-correction tool for nanopore sequence data
https://github.com/jgurtowski/nanocorr

NanoOK
Software for nanopore data, quality and error profiles
https://documentation.tgac.ac.uk/display/NANOOK/NanoOK

Nanopolish
Nanopore analysis and genome assembly software
https://github.com/jts/nanopolish

nanopore
Variant-detection tool for nanopore sequence data
https://github.com/mitenjain/nanopore

Nanocorrect
Error-correction tool for nanopore sequence data
https://github.com/jts/nanocorrect/

npReader
Real-time conversion and analysis of nanopore reads
https://github.com/mdcao/npReader

poRe
Tool for analyzing and visualizing nanopore data
https://sourceforge.net/p/rpore/wiki/Home/

PoreSeq
Error-correction and variant-calling software
https://github.com/tszalay/poreseq

Poretools
Nanopore sequence analysis and visualization software
https://github.com/arq5x/poretools

SSPACE-LongRead
Genome scaffolding tool
http://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE-longread

SMIS
Genome scaffolding tool
https://sourceforge.net/projects/phusion2/files/smis/

List of assemblers for Oxford Nanopore MinION long reads

LQS
DALIGNER, Celera OLC Nanocorrect,
Nanopolish corrector
https://github.com/jts/nanopolish

PBcR
HGAP or BLASR, Celera OLC
PBcR corrector
http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR
–
Canu
MHAP, Celera OLC
Canu corrector
https://github.com/marbl/canu

Falcon
String graph, Celera OLC
Falcon corrector
https://github.com/PacificBiosciences/falcon

Miniasm
OLC
https://github.com/lh3/miniasm

ra-integrate
OLC
https://github.com/mariokostelac/ra-integrate/

ALLPATHS-LG
de Bruijn graph
ALLPATHS-L corrector
https://www.broadinstitute.org/software/allpaths-lg/blog/?page_id=12

SPAdes
de Bruijn graph
SPAdes corrector
http://bioinf.spbau.ru/spades

Gap filling or Contigs extensions tools !

Rahul Nayak — Fri, 01 Jun 2018 08:07:32 -0500

There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like GAPresolution that can help to perform local re-assemblies using 454 reads. We used GAPresolution but it is not a very good software, it is useful only in some specific situations.

Take a look at the PRICE software from the DeRisi lab. Its meant to do something very similar. http://derisilab.ucsf.edu/index.php?page=software

You could also look at SSPACE (http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/sspacev12/), ATLAS tools (http://www.hgsc.bcm.tmc.edu/content/bcm-hgsc-software), and SCARPA (http://compbio.cs.toronto.edu/hapsembler/scarpa.html).

See the PAGIT protocol: http://www.sanger.ac.uk/resources/software/pagit/

In particular, take a look at the IMAGE tool: http://genomebiology.com/2010/11/4/R41

Also SOAPdenovo has ha function for scaffolding. Not sure about ABYSS

Here there is a useful explanation of several tools.

https://bioinformaticsonline.com/search?q=scaffolding&entity_type=object&entity_subtype=bookmarks&offset=0&search_type=entities

I could be wrong, but the above answers to your hypothetical scenario appear to miss the point that you aren't interested in assembling the full genome, just the 100 kb part you're interested in. I suggest the following algorithm:

1. Start with the initial assembly C0 of the contigs you have identified as overlapping your region of interest, and the set S of reads those contigs contain. Let C = C0.

2. Repeat:
a. Identify paired-end reads (not in C) for which one or both ends align within, or extending, contigs in C.
b. Identify unpaired reads that align extending these new paired-end reads.
c. Construct a new assembly C' from C and the new reads identified in (a) and (b).
d. Trim C' so it does not extend more than 100 kb to either end of C0. Set C = C'.
e. Let S' denote the reads that contribute to C'. If S' does not contain any reads not present in S, stop. Otherwise, Set S = S'.

3. If you don't have a complete assembly of the region of interest, generate an STS for each end of each contig, probe a library for clones including these STSes, subclone these clones into a paired-end sequencing vector, and generate paired-end reads for this library; then try steps (1) and (2) again, adding these new sequencing reads to what you had before.

4. If your average sequencing depth for the region of interest exceeds 25 or so without filling all gaps, it is likely that the remaining gaps represent sequences that are not getting cloned in your sequencing vectors. Try different sequencing vectors.